Indexing Wars: The Graph vs Subsquid vs Self-Hosted Solutions - Which Actually Scales?

After running my team’s data pipelines on The Graph, Subsquid, and self-hosted solutions for the past year, I’ve finally gathered enough real-world data to share what actually scales in production.

The Indexing Landscape in 2026

If you’re building anything serious in Web3, you need indexed data. The question is: how do you get it? The three main approaches:

  1. The Graph (Decentralized, subgraph-based)
  2. Subsquid/SQD (Flexible squids, near zero-cost data access)
  3. Self-Hosted (Run your own indexer stack)

Each has dramatically different tradeoffs. Let me break down what we learned.

The Graph: The Pioneer That’s Showing Its Age

The Graph pioneered declarative blockchain indexing and deserves credit for making this space accessible. But here’s the reality in 2026:

What Works

  • Mature ecosystem: Huge library of existing subgraphs
  • Decentralized: GRT-based network is genuinely decentralized
  • GraphQL API: Clean query interface for frontend devs

What Doesn’t

  • Performance: Our Uniswap V3 indexing benchmarks showed The Graph running 10-50x slower than Subsquid for historical data
  • Rigidity: Subgraphs are compiled to WASM and run in a black box. Making changes often requires complete redeployment
  • Hosted Service Death: The Graph Hosted Service was fully deprecated in 2026. You’re now on the decentralized network ($$) or migrating elsewhere
// The Graph subgraph architecture (simplified)
// Data flows: Archive Node → WASM Black Box → Postgres → GraphQL

// You can't easily:
// - Batch process multiple blocks efficiently
// - Filter at the data source level
// - Transform data with custom logic outside WASM constraints

Cost Reality

We spent approximately $2,400/month for moderate DeFi dashboard queries. The GRT token economics add unpredictability—your costs fluctuate with token price and indexer competition.

Subsquid (SQD): The Performance Beast

Subsquid has emerged as the serious alternative, and the performance numbers don’t lie.

Architecture Advantage

// Subsquid architecture
// Data flows: Subsquid Network (Archive) → Squid SDK (Node.js) → Your Database → Your API

// Key differences:
// 1. Squids are regular Node.js apps - full flexibility
// 2. Batch processing: up to 150k blocks/second
// 3. Filter at source: only fetch data you need
// 4. Custom transforms: any TypeScript logic you want

import {EvmBatchProcessor} from '@subsquid/evm-processor'

const processor = new EvmBatchProcessor()
  .setGateway('https://v2.archive.sqd.io/ethereum-mainnet')
  .setRpcEndpoint(process.env.RPC_URL)
  .addLog({
    address: [UNISWAP_V3_FACTORY],
    topic0: [events.PoolCreated.topic],
  })
  .setFields({
    log: {
      transactionHash: true,
    },
  })

Our Benchmarks

Task The Graph Subsquid Self-Hosted
Uniswap V3 full index 12+ hours 45 minutes 3-4 hours
Daily incremental sync 2-3 min lag <30 sec lag ~1 min lag
Complex aggregations Timeout issues Handled Depends on setup
Monthly cost (moderate load) $2,400 $400-800 $1,200+ (infra)

What I Love

  • Speed: Near zero-cost access to historical data
  • Flexibility: It’s just Node.js. Debug, log, transform however you want
  • Multi-chain: Same squid can index 100+ networks
  • Subsquid Cloud: Deploy with zero-downtime migrations

What Concerns Me

  • Data Freshness: There’s a delay between on-chain events and indexed data. Centralized archives need pre-indexing before your squid processes it. For real-time apps, this matters
  • Learning Curve: More flexible = more decisions to make
  • Younger Ecosystem: Fewer ready-made templates than The Graph

Self-Hosted: The Control Freak’s Choice

For the brave (or budget-constrained), self-hosted indexing is always an option.

When It Makes Sense

  1. Regulatory requirements: Some enterprises can’t send data to third parties
  2. Custom data needs: Highly specialized indexing logic
  3. Cost optimization at scale: If you’re indexing 10+ chains heavily

Our Stack

# Self-hosted indexing infrastructure
services:
  # Archive node (Erigon for Ethereum)
  archive-node:
    image: thorax/erigon:latest
    volumes:
      - ./data/erigon:/data
    # Needs 2TB+ SSD, serious hardware

  # Custom indexer
  indexer:
    build: ./indexer
    depends_on:
      - archive-node
      - postgres
    environment:
      - RPC_URL=http://archive-node:8545
      - DATABASE_URL=postgresql://postgres:5432/indexer

  # TimescaleDB for time-series queries
  postgres:
    image: timescale/timescaledb:latest
    volumes:
      - ./data/postgres:/var/lib/postgresql/data

Real Costs

Component Monthly Cost
Archive Node (Ethereum) $300-500 (dedicated server)
Database (managed) $200-400
Compute (indexer) $100-200
DevOps Time Priceless :sweat_smile:
Total $600-1,100 + time

The Hidden Costs

What nobody tells you about self-hosting:

  • Chain upgrades break things: Pectra just shipped. Your indexer probably needs updates
  • Data consistency: One missed block = corrupted state. Rebuilding is painful
  • On-call duties: Archive node goes down at 3 AM? That’s your problem

My Recommendation Matrix

Scenario Recommendation
Startup, need to ship fast Subsquid Cloud
Decentralization matters deeply The Graph Network
Enterprise with compliance needs Self-hosted + Subsquid SDK
Real-time trading/MEV Self-hosted with custom stack
Multi-chain aggregator Subsquid (built for this)
Querying existing protocol data The Graph (if subgraph exists)

The Consolidation Trend

Something worth watching: the RPC provider market is consolidating, and indexing is following. Smaller providers are disappearing. If your favorite indexing solution is a small startup, have a backup plan.

The winners seem to be:

  • Subsquid: Technical performance, dev experience
  • The Graph: Decentralization narrative, ecosystem effects
  • Space and Time: Enterprise play, SQL compatibility

Questions for the Community

  1. What’s your latency requirement? If you need sub-second data freshness, most indexers won’t cut it
  2. How much customization do you need? Subgraph constraints vs squid flexibility
  3. What’s your team’s skill set? GraphQL-first vs TypeScript-first

I’m genuinely curious what others are using. Are you rolling your own, paying for hosted solutions, or doing something hybrid?


Data collected from production workloads indexing Ethereum, Arbitrum, and Base. Your mileage may vary.

This benchmarking data is gold. I’ve been running similar comparisons at our analytics company and can validate most of Mike’s findings.

Adding Some Data Points

We index about 15 EVM chains for our DeFi dashboards. Here’s what I’ve found:

Subsquid’s Batching Is Game-Changing

The performance difference isn’t just incremental—it’s architectural. The Graph processes events one by one through WASM. Subsquid’s batch processor can request data for thousands of blocks in a single call:

// This is why Subsquid is fast
processor.run(new TypeormDatabase(), async (ctx) => {
  // ctx.blocks contains hundreds of blocks worth of data
  // Process them all in one batch with full TypeScript power

  for (const block of ctx.blocks) {
    for (const log of block.logs) {
      // Your transform logic here
    }
  }

  // One database transaction for the entire batch
  await ctx.store.save(entities)
})

With The Graph, you’re limited by the WASM execution model. Each event triggers a handler, and you can’t easily batch database operations.

The Hidden The Graph Problem: Subgraph Quality

What Mike didn’t mention: not all subgraphs are created equal. I’ve seen:

  • Abandoned subgraphs: Original dev left, nobody maintains it
  • Incorrect mappings: Data looks right but has subtle bugs
  • Version conflicts: Multiple versions indexed differently

We once spent two weeks debugging why our TVL numbers didn’t match DeFiLlama. Turns out the community subgraph had a rounding error in its liquidity calculations.

Lesson learned: Always audit the subgraph code, even “official” ones.

Cost Breakdown Accuracy

Mike’s cost numbers align with ours:

Service Our Monthly Cost Notes
The Graph Network $1,800-3,000 Varies with GRT price
Subsquid Cloud $500-700 Predictable pricing
Self-hosted (partial) $800 Just critical chains

The GRT volatility is a real business problem. Budgeting is impossible when your indexing costs are denominated in a token that can swing 30% in a week.

When The Graph Still Wins

That said, The Graph has advantages:

  1. Existing subgraphs: Need Uniswap data? Someone already built it
  2. Community support: More Stack Overflow answers, more tutorials
  3. Decentralization story: If you’re building for the “decentralize everything” crowd, it matters

For our public-facing dashboards, we actually use The Graph for read-only data that doesn’t need real-time freshness. It’s “good enough” for many use cases.

My Hybrid Approach

We ended up with a hybrid architecture:

┌─────────────────┐     ┌─────────────────┐
│  Critical Data  │     │  General Data   │
│   (Subsquid)    │     │  (The Graph)    │
└────────┬────────┘     └────────┬────────┘
         │                       │
         └───────────┬───────────┘
                     │
              ┌──────▼──────┐
              │   Unified   │
              │   GraphQL   │
              │   Gateway   │
              └─────────────┘
  • Critical paths (liquidations, MEV, real-time prices): Subsquid
  • Historical analytics (TVL charts, volume history): The Graph or cached Subsquid
  • One API surface: Apollo Federation stitches it together

This gives us 90% of the benefits of both systems without fully committing to either.

Question for Mike

What’s your strategy for handling chain reorgs? We’ve had issues where Subsquid’s archive sometimes serves data from orphaned blocks, causing temporary inconsistencies. How do you handle that in production?

Great technical deep-dive. As someone who’s been running distributed systems for over a decade (pre-blockchain!), I want to add the operational perspective that often gets overlooked.

The “Boring” Reliability Factor

Performance benchmarks are sexy. Uptime isn’t. But in production, reliability trumps raw speed every time.

The Graph: Battle-Tested but Brittle

The Graph’s decentralized network has survived multiple market cycles. That’s worth something. But I’ve also seen:

  • Indexer instability: Sometimes indexers just… stop. Your queries fail until they restart
  • Query routing issues: The gateway occasionally routes to slow or outdated indexers
  • Slow reindexing: When a subgraph needs to resync, you’re waiting days

We had a production incident where a major indexer went offline during a DeFi exploit. Our dashboards went blank right when users needed them most.

Subsquid: Fast but Young

Subsquid Cloud has been reliable in my experience, but it’s a younger service. A few concerns:

  1. Single-vendor risk: The network is less decentralized than The Graph
  2. Smaller team: Fewer engineers on call for emergencies
  3. Rapid changes: Good for features, challenging for stability

That said, their support has been responsive. I’ve gotten issues resolved faster than with The Graph Foundation.

Self-Hosted: You Break It, You Buy It

Running your own archive node is an exercise in humility. Here’s what I’ve learned:

# Things that have broken my self-hosted indexer:
- Disk I/O saturation (archive nodes are HUNGRY)
- OOM kills during reorg handling
- Network timeouts during large batch processes
- Postgres vacuuming at the wrong time
- That one time I fat-fingered a config and wiped the database

The “priceless DevOps time” Mike mentioned is real. Budget at least 10-15 hours/month for maintenance, more during chain upgrades.

Database Decisions Matter More Than Indexer Choice

Something that doesn’t get enough attention: your choice of database affects performance more than your choice of indexer.

TimescaleDB for Time-Series

If you’re doing historical analytics, TimescaleDB is a game-changer:

-- Without TimescaleDB: query takes 45 seconds
SELECT
  time_bucket('1 hour', block_timestamp) as hour,
  SUM(volume) as hourly_volume
FROM swaps
WHERE block_timestamp > NOW() - INTERVAL '30 days'
GROUP BY hour;

-- With TimescaleDB hypertable: same query takes 200ms
-- Automatic partitioning + compression = magic

PostgreSQL Optimization

Regardless of indexer, these PostgreSQL settings make a huge difference:

# postgresql.conf for indexer workloads
shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 256MB
maintenance_work_mem = 1GB
random_page_cost = 1.1  # SSD assumption
effective_io_concurrency = 200

# For heavy write workloads (indexing)
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB

My Reliability Checklist

Before deploying any indexer to production:

  • Graceful degradation: What happens when the indexer is down? Cached data? Error page?
  • Monitoring: Alert on lag, not just availability
  • Backup strategy: Can you rebuild from scratch if needed? How long?
  • Rollback plan: New subgraph/squid breaks? How fast can you revert?
  • Load testing: Test at 10x expected traffic

Pragmatic Recommendation

For most teams, I’d say:

  1. Start with Subsquid Cloud for speed and flexibility
  2. Keep The Graph as a fallback for non-critical paths
  3. Only self-host if you have dedicated DevOps and regulatory/cost requirements

The “best” solution is the one your team can actually operate reliably. A slower, stable system beats a fast, flaky one every time.

What’s everyone’s monitoring setup for indexer health? We use Datadog with custom metrics, but curious if there are blockchain-specific tools people recommend.

This thread is incredibly helpful. I’ve been evaluating indexing solutions for our yield aggregator and the real-world data here is exactly what I needed.

The DeFi-Specific Perspective

For DeFi applications, the indexing requirements are different from general analytics. Here’s what matters most:

Speed Kills (Literally, Your Users’ Funds)

In DeFi, data latency isn’t just a UX problem—it’s a safety issue. Consider:

  • Liquidation dashboards: Users need to know their health factor in real-time
  • MEV protection: If your data is stale, arbitrageurs will front-run your users
  • Oracle comparisons: Verifying price feeds requires sub-second data

Mike mentioned Subsquid’s data freshness concern. For our yield bots, we actually can’t use centralized archives for critical paths:

// Our hybrid approach for yield optimization
class YieldMonitor {
  // Real-time: Direct RPC for critical data
  async getCurrentYield(pool: Address): Promise<number> {
    // Can't afford indexer lag here
    return await this.rpcProvider.call(pool, 'getYield')
  }

  // Historical: Subsquid for trend analysis
  async getYieldHistory(pool: Address, days: number): Promise<YieldPoint[]> {
    // Subsquid is perfect for this - speed + cost
    return await this.squidClient.query(YIELD_HISTORY_QUERY, { pool, days })
  }
}

Smart Contract Events Are Incomplete

One thing that frustrates me about all indexers: they assume events tell the whole story. They don’t.

Many protocols have:

  • View functions that return computed state
  • Internal state not emitted as events
  • Cross-contract calls that are hard to trace

For our impermanent loss tracker, we need to call getReserves() at historical blocks, not just index Sync events. This requires RPC access, which Subsquid handles better than The Graph (you can mix archive data with RPC calls).

The Multi-Chain Nightmare

We aggregate yields across 12 chains. Each indexer handles multi-chain differently:

Indexer Multi-Chain Support Notes
The Graph Separate subgraphs per chain Pain to maintain 12 subgraphs
Subsquid Single squid, multiple chains Much cleaner architecture
Self-hosted Maximum flexibility Maximum pain

Subsquid’s ability to index multiple chains in a single project has saved us countless hours:

// One squid, multiple chains
const processor = new EvmBatchProcessor()

// Configure different gateways per chain
if (process.env.CHAIN === 'ethereum') {
  processor.setGateway('https://v2.archive.sqd.io/ethereum-mainnet')
} else if (process.env.CHAIN === 'arbitrum') {
  processor.setGateway('https://v2.archive.sqd.io/arbitrum')
}
// Same codebase, different deployments

Cost Matters for DeFi Margin

DeFi protocols often operate on thin margins. Our yield aggregator takes a 0.5% fee. Every dollar spent on indexing comes directly out of that margin.

Mike’s cost comparison convinced me to prioritize Subsquid for our primary workloads. The $1,600/month difference between The Graph and Subsquid is real money.

What I Wish Indexers Would Add

  1. Native price oracles: Stop making us index Chainlink separately
  2. Standardized DeFi schemas: Common interfaces for swaps, pools, vaults
  3. Alert triggers: Notify me when TVL drops 20%, not just serve queries
  4. Simulated transactions: “What would happen if I made this trade?”

Question

Has anyone successfully used indexers for real-time liquidation monitoring? We’re considering building a custom solution but wondering if others have made standard indexers work for this use case.

Love this thread! As someone who came from the traditional tech world and learned Web3 the hard way, I want to share the developer experience perspective.

The Learning Curve Reality

When I started in Web3, I had no idea what blockchain indexing even meant. Here’s what the learning experience looks like for each option:

The Graph: Familiar but Frustrating

The Graph uses GraphQL, which most web devs already know. That’s a big plus. But:

Day 1-3: “Cool, I know GraphQL, this should be easy”
Day 4-7: “Why is my subgraph taking 12 hours to deploy?”
Week 2: “The mapping got stuck, how do I debug WASM?”
Month 2: “I need to change one field and redeploy everything???”

The WASM black box is the biggest friction point. When something goes wrong, debugging is painful:

// The Graph debugging experience
// Step 1: Add log statements (limited logging in WASM)
// Step 2: Deploy (wait 2-4 hours)
// Step 3: Check logs (often unhelpful)
// Step 4: Repeat

// You can't:
// - Set breakpoints
// - Inspect state easily
// - Run locally without special tooling

Subsquid: Steeper Start, Smoother Ride

Subsquid requires understanding more concepts upfront, but it’s just TypeScript:

// Subsquid debugging experience
// Step 1: Add console.log or actual debugger
// Step 2: Run locally with `sqd run`
// Step 3: See output immediately
// Step 4: Fix and test

// Standard Node.js debugging works:
// - VS Code breakpoints
// - Chrome DevTools
// - Good old console.log

The tradeoff: you need to understand more about how indexing works. There’s no magic abstraction hiding the complexity.

Self-Hosted: PhD Required

I tried self-hosting before I really understood what I was doing. Don’t do this. You need to understand:

  • Archive node operation
  • Database optimization
  • Reorg handling
  • Backfill strategies

Unless you have someone with serious infrastructure experience, avoid this path initially.

Documentation Quality

This matters more than people admit:

Indexer Docs Quality Notes
The Graph Good basics, sparse advanced Many community tutorials
Subsquid Excellent technical docs Growing cookbook section
Self-hosted Stack-specific You’re reading source code

Subsquid’s docs have improved dramatically in 2025-2026. They actually explain why things work, not just how.

The Frontend Integration Story

As a full-stack dev, I care about how indexers integrate with my React apps:

The Graph + Apollo Client

// The Graph is GraphQL-native, so Apollo just works
import { useQuery } from '@apollo/client'

function PoolData({ address }) {
  const { data, loading } = useQuery(GET_POOL, {
    variables: { address },
    pollInterval: 30000, // Poll for updates
  })

  // Standard Apollo patterns apply
}

Subsquid + Custom Client

// Subsquid also serves GraphQL, but you might want custom fetching
import { useQuery } from '@tanstack/react-query'

function PoolData({ address }) {
  const { data, isLoading } = useQuery({
    queryKey: ['pool', address],
    queryFn: () => fetchFromSquid(address),
    refetchInterval: 10000,
  })
}

// Or use their generated GraphQL client
// sqd codegen generates typed client from your schema

Both work fine for frontend integration. The main difference is ecosystem—more Apollo examples exist for The Graph.

My Recommendation for New Devs

  1. If you know GraphQL: Start with The Graph using an existing subgraph. Get familiar with the query patterns
  2. If you need to build custom: Go straight to Subsquid. The TypeScript DX is worth learning
  3. Skip self-hosted initially: There’s enough to learn without adding infrastructure complexity

Security Note

Since I work in DeFi security, I have to mention: indexers are off-chain. Never trust indexed data for on-chain decisions without verification.

// ❌ Dangerous: Trust indexed data for liquidation
const healthFactor = await indexer.getHealthFactor(user)
if (healthFactor < 1) liquidate(user)

// ✅ Safe: Verify on-chain before acting
const healthFactor = await contract.getHealthFactor(user)
if (healthFactor < 1) liquidate(user)

Indexers can have bugs, lag, or be manipulated. Always verify critical data on-chain.


Bookmarking this thread. Great resource for anyone evaluating their data stack!