The indexing wars conversation has mostly focused on who can run subgraphs faster and cheaper. But Goldsky is doing something more interesting: they are arguing that the subgraph model itself is the wrong abstraction for many use cases. And after spending three weeks integrating their streaming platform, I think they might be right.
The Subgraph Model: A Quick Recap
The Graph popularized the subgraph as the standard unit of blockchain data indexing. You write a manifest that defines:
- Which smart contracts to watch
- Which events to index
- How to transform and store the data
- A GraphQL schema for querying
This batch indexing model works well for historical queries: “show me all DEX swaps in the last 24 hours” or “what is the total value locked in this lending protocol?” You write your subgraph, deploy it, wait for it to sync, and then query it via GraphQL.
But the model has fundamental limitations for real-time use cases:
1. Polling latency. Subgraphs are updated in batches as new blocks are processed. Your application polls the GraphQL endpoint to check for updates. The gap between an on-chain event occurring and your application receiving the data is typically 5-15 seconds at best, and can be much longer if the indexer is behind.
2. No push mechanism. The subgraph model is inherently pull-based. Your application has to ask “is there new data?” repeatedly. This wastes resources and creates unnecessary latency.
3. Query-time computation. Complex aggregations are computed at query time, which means every dashboard refresh re-executes expensive calculations. If you have 1,000 users refreshing their portfolio value simultaneously, that is 1,000 identical computation passes.
4. The sync gap problem. When a new subgraph is deployed or an existing one needs to re-index, there can be hours or even days of sync time. During this period, your application either serves stale data or fails entirely.
Goldsky’s Streaming Paradigm
Goldsky’s approach flips the model. Instead of indexing data into a proprietary store and exposing it via GraphQL, they stream blockchain events directly into your infrastructure.
Mirror: Your Database, Their Pipeline
The Mirror product continuously replicates on-chain data into your own database – PostgreSQL, ClickHouse, BigQuery, or others. Your application reads from its own database, not an external API.
This seemingly simple change has profound implications:
- Zero query latency for reads, because the data is already local in your database.
- Full SQL flexibility. You are not constrained by GraphQL schema design. Any SQL query works.
- Pre-computed aggregations. You can create materialized views, triggers, and computed columns using standard database tools.
- No external dependency for reads. Even if Goldsky’s pipeline goes down temporarily, your application continues serving data from your database. The pipeline catches up when connectivity is restored.
Webhooks: Event-Driven Architecture
Goldsky’s webhook system pushes notifications to your backend when specific on-chain events occur. Instead of polling for new data, your application receives callbacks.
For example, instead of:
// Polling every 5 seconds
setInterval(async () => {
const result = await graphqlClient.query(GET_RECENT_SWAPS);
updateUI(result.data);
}, 5000);
You get:
// Webhook handler - called immediately when event occurs
app.post("/webhook/swap", (req, res) => {
const swapEvent = req.body;
notifyUser(swapEvent);
updateAnalytics(swapEvent);
res.status(200).send("ok");
});
This is the same event-driven architecture that powers modern Web2 applications (Stripe webhooks, Twilio callbacks, etc.), applied to blockchain data.
When Streaming Beats Batch Indexing
After three weeks of integration, here are the use cases where Goldsky’s streaming approach clearly wins:
1. Real-time notifications. “Alert me when my liquidation threshold is approaching.” With subgraphs, you poll every N seconds and hope you catch the event in time. With webhooks, you get notified within seconds of the on-chain event.
2. Live dashboards. Trading interfaces, portfolio trackers, and DeFi dashboards that need to reflect current state. Mirror keeps your database in sync continuously, so every page load shows current data.
3. Backend event processing. Automated actions triggered by on-chain events – rebalancing strategies, arbitrage execution, governance vote notifications. The webhook model eliminates the polling overhead entirely.
4. Analytics and reporting. Having blockchain data in your own PostgreSQL or ClickHouse instance means you can use standard BI tools (Metabase, Grafana, Looker) to create dashboards without any custom integration.
When Batch Indexing Still Makes Sense
Goldsky’s streaming is not universally better. Batch indexing with subgraphs still wins for:
1. Simple, infrequent queries. If you just need to look up a token balance or check an NFT owner occasionally, spinning up a streaming pipeline is overkill.
2. Historical deep dives. Complex queries against months of historical data are better served by a fully indexed and optimized GraphQL endpoint.
3. Prototyping and development. Subgraphs are faster to set up for quick experiments. Mirror requires database infrastructure and pipeline configuration.
The Cost Question
Goldsky’s pricing is based on data volume streamed and pipeline complexity, which can get expensive for high-throughput chains. A full Ethereum mainnet mirror with all token transfers and DeFi events could cost significantly more than an equivalent Ormi subgraph.
However, the total cost of ownership calculation is different because you are running your own database. If you already have PostgreSQL or ClickHouse infrastructure, the marginal cost of adding Goldsky’s Mirror is lower than standing up a separate indexing service.
My Assessment
Goldsky is not trying to win the “faster subgraph” race. They are arguing that the subgraph abstraction is inadequate for modern application needs, and I think they are making a compelling case.
The future of blockchain data infrastructure is probably not a single paradigm. It is a combination of batch indexing (for historical queries), real-time streaming (for live data), and push-based events (for automated workflows). Teams that adopt a multi-paradigm approach will build better products.
Has anyone else experimented with Goldsky’s streaming products? I would love to hear about other teams’ experiences, especially around reliability and cost at scale.