🔌 Token2049 Day 1: The RPC Infrastructure Crisis Nobody's Talking About

Just walked out of the “Infrastructure at Scale” panel at Token2049 Singapore. Mind blown.

Everyone’s talking about AI, RWAs, and DeFi. But there’s a crisis brewing in the foundation layer that most people are ignoring: RPC infrastructure is breaking.

The Panel That Opened My Eyes

Speakers:

  • Infrastructure lead from a Top 5 CEX
  • CTO from major DeFi protocol (100M+ TVL)
  • Founder of RPC provider (anonymized, under NDA)
  • My role: Operate RPC nodes for mid-size projects

The uncomfortable truth they shared: “We’re one outage away from a cascade failure across multiple protocols.”

The Numbers That Shocked Me

The anonymous RPC provider shared their internal metrics:

2023 landscape:

  • ~50 serious RPC providers globally
  • Average response time: 200-500ms
  • Downtime acceptable: 99% uptime was “good enough”
  • Cost per request: Not heavily optimized

2025 reality (what we learned at Token2049):

  • 197+ RPC node providers competing in the space
  • Response time requirements: <50ms median (some protocols demand <20ms)
  • Downtime unacceptable: 99.9% is minimum, 99.99% is expected
  • Cost pressure: Race to the bottom on pricing

Why the change?

Layer-2 rollups. DeFi composability. MEV. Real-time trading.

The infrastructure engineer from the CEX put it bluntly: “Our users expect their transaction to confirm before they lift their finger off the phone. That means every component in the stack needs to be sub-50ms. RPC calls are the biggest bottleneck.”

The Bitcoin Node Problem

This one hit close to home. I run Bitcoin nodes.

The challenge: Bitcoin network has grown to 21,800+ reachable nodes worldwide. Sounds great for decentralization, right?

The problem: Most of these nodes are SLOW. Old hardware, residential internet connections, inconsistent uptime.

What the panel revealed:

A major DeFi protocol trying to build Bitcoin-wrapped assets tested 100 different Bitcoin RPC endpoints:

  • 47 had >500ms response times
  • 23 had reliability issues (>5% failed requests in 24hr period)
  • 12 had data inconsistencies (different chain tips!)
  • Only 18 were production-grade

Their solution: They now run their own Bitcoin nodes in 7 geographic regions. Cost: $80K/month in infrastructure + 2.5 FTEs managing them.

My reaction: This is insane. We shouldn’t need this much redundancy for basic RPC access.

The Centralization Paradox

Here’s the part that made me uncomfortable:

DePIN founder (from audience question): “Isn’t the proliferation of 197+ RPC providers a good thing? More decentralization?”

Panel response (CTO): “It’s a mirage. 80% of production traffic goes through 5 providers. The other 192 are fighting for scraps or serving hobbyists.”

The centralization problem:

  • Infura (ConsenSys)
  • Alchemy
  • QuickNode
  • Ankr
  • BlockEden (hey, that’s us!)

These 5 handle the vast majority of Ethereum, Polygon, BSC traffic. If two of them go down simultaneously, the entire DeFi ecosystem has a crisis.

Why the consolidation?

  1. Performance requirements - Only well-funded providers can afford global CDN infrastructure
  2. Reliability demands - 99.99% uptime requires serious engineering
  3. Multi-chain support - Projects want one provider for 20+ chains, not 20 different providers

The panel’s conclusion: “We’re recreating the cloud provider oligopoly (AWS/Azure/GCP) but for blockchain infrastructure.”

The Latency Arms Race

This is where it gets technical.

The new standard: Sub-50ms RPC response times. Some protocols demanding sub-20ms.

How do you achieve this?

Geographic distribution:

  • 7+ regions globally (US East, US West, EU, Asia-Pacific, etc.)
  • Edge caching where possible
  • Intelligent routing to nearest healthy node

Infrastructure optimization:

  • NVMe SSDs (spinning disks are dead)
  • 10Gbps+ network interfaces
  • Dedicated hardware (no cloud VMs for production nodes)
  • Archive nodes separately (different perf characteristics than full nodes)

Load balancing intelligence:

  • Health checks every 5 seconds
  • Automatic failover in <2 seconds
  • Request routing based on: latency, load, chain sync status

The cost: One panelist mentioned their infrastructure costs $500K+/month for a top-tier multi-chain RPC service.

For context: That’s more than many protocols spend on smart contract development.

The MEV Protection Dilemma

This came up and it’s fascinating:

Problem: Public RPC endpoints leak transaction data. MEV bots frontrun users.

Solution 1: Private mempools (Flashbots, Eden Network)

  • Great for users
  • But: Adds complexity and potential centralization

Solution 2: RPC providers with MEV protection

  • Bundle transactions
  • Direct submission to block builders
  • Skip public mempool

The tension: RPC providers want to offer MEV protection. But integrating with Flashbots/block builders adds 20-50ms latency.

Trade-off: Protection vs. Speed.

Most users don’t realize this trade-off exists. Panel consensus: “We need better user education about what they’re opting into.”

What I’m Taking Away

As someone who operates RPC infrastructure, here’s what I learned:

1. The bar is rising fast

Sub-50ms response times are becoming table stakes. If you can’t hit this consistently, you’re not competitive.

2. Multi-chain is mandatory

Single-chain RPC providers are dying. Projects want one endpoint for Ethereum, Polygon, Arbitrum, Base, etc.

3. Archive node economics are broken

Archive nodes (full historical state) are EXPENSIVE to run. But users expect them for free. Unsustainable.

Pricing models need to change: Query-based pricing, tiered access, or specialized archive-as-a-service.

4. Decentralization is hard at scale

197 providers sounds decentralized. But in practice, high reliability + low latency + multi-chain support = expensive = only big players survive.

5. The infrastructure layer is undervalued

DeFi protocols get millions in VC funding. RPC providers struggle to charge enough to cover costs.

One panelist: “We’re the roads and bridges. Everyone uses us. Nobody wants to pay for maintenance.”

My Questions for the Community

I run RPC nodes. I live this daily. But I want to hear from others:

For developers:

  • What’s your RPC pain point? Reliability? Latency? Cost?
  • Do you run your own nodes or use providers?
  • What would make you switch providers?

For RPC operators:

  • How are you handling the sub-50ms latency requirements?
  • What’s your multi-chain strategy?
  • Are you profitable or subsidizing with VC money?

For everyone:

  • Should RPC infrastructure be a public good (funded by protocols/foundations)?
  • Or should it be a competitive market (users pay for performance)?

Token2049 opened my eyes. The infrastructure crisis is real. And we need to talk about it.

Sources:

  • Token2049 Singapore 2025 “Infrastructure at Scale” panel (Oct 1, Day 1)
  • Crypto APIs “Reliable Bitcoin RPC Nodes: What to Look For in 2025”
  • CompareNodes.com “197 Providers for RPC Nodes and Blockchain APIs”
  • Panel discussions with CEX infrastructure lead, DeFi CTO, RPC provider founder
  • Personal experience operating RPC nodes

@infra_hans This hits way too close to home. I’m an infrastructure engineer at a blockchain project and we’ve been fighting the RPC battle for 18 months.

Our RPC Infrastructure Journey (A Cautionary Tale)

January 2024: Launched our L2 protocol. Used Infura for RPC. Easy, fast, worked great.

March 2024: Traffic grew 10x. Infura bill: $8K/month. Started looking at alternatives.

June 2024: Switched to “cheaper” provider. Saved $4K/month. Victory!

July 2024: 3 outages in one month. Users couldn’t access dApp. Twitter exploded.

August 2024: Emergency: Built our own RPC infrastructure. 3 engineers, 6 weeks. Cost to build: $150K.

October 2025 (now): Running 15 nodes across 5 regions. Cost: $25K/month + 1.5 FTEs maintaining them.

Total cost of “saving money”: $150K upfront + $420K/year ongoing = Way more than just paying Infura.

Lesson learned: Infrastructure is not the place to cut costs.

The Technical Reality You Mentioned

You talked about sub-50ms latency requirements. Let me share our actual data:

Our RPC performance (last 30 days):

Self-hosted nodes (5 regions):

  • P50 latency: 35ms
  • P95 latency: 120ms
  • P99 latency: 450ms (!!!)
  • Uptime: 99.87%

Commercial provider (Alchemy backup):

  • P50 latency: 25ms
  • P95 latency: 65ms
  • P99 latency: 180ms
  • Uptime: 99.97%

Analysis:

Our self-hosted infrastructure is WORSE than commercial providers on every metric. And it costs more (when you include engineering time).

Why?

Because Alchemy has:

  • 20+ engineers optimizing infrastructure
  • Global CDN with 50+ points of presence
  • Automatic failover tested continuously
  • Economies of scale (thousands of customers)

We have:

  • 1.5 engineers (mostly firefighting)
  • 5 regions (not enough)
  • Manual failover (takes 10-15 minutes)
  • No economies of scale (just us)

The uncomfortable truth: For most projects, self-hosting RPC is a money pit.

The Bitcoin Node Problem You Mentioned

You said the DeFi protocol tested 100 Bitcoin RPC endpoints and only 18 were production-grade.

We experienced this EXACT problem building a Bitcoin bridge.

What we learned:

Problem 1: Inconsistent chain state

We queried 5 different Bitcoin RPC endpoints for the same block:

  • 3 returned the block immediately
  • 1 was 2 blocks behind
  • 1 timed out

Which one do we trust?

Our solution: Query 3 endpoints simultaneously, use consensus. But this triples our RPC calls (and costs).

Problem 2: Pruned vs Archive nodes

Most Bitcoin nodes are PRUNED (only keep recent blocks). But we need archive data for verifying historical deposits.

Finding archive Bitcoin nodes:

  • Public archives: Slow (500ms+ latency) or unreliable
  • Commercial providers: $500-2000/month JUST for Bitcoin archive access
  • Self-hosting: 650GB+ disk space, expensive hardware

We ended up: Paying for commercial Bitcoin archive access. No way around it.

Problem 3: Mempool inconsistency

Different Bitcoin nodes have different mempools (pending transactions). This matters for fee estimation and transaction confirmation predictions.

We query 5 nodes, get 5 different mempool sizes. Which is “truth”?

Answer: There is no single truth. Mempool is eventually consistent.

The Centralization Problem is WORSE Than You Think

You mentioned 80% of traffic goes through 5 providers. I think it’s even more concentrated.

Our analysis (looking at public Ethereum RPC usage):

Estimated market share (based on docs/tutorials/default settings):

  • Infura: ~35%
  • Alchemy: ~25%
  • QuickNode: ~15%
  • Ankr: ~8%
  • Everyone else: ~17%

Top 2 providers = 60% of traffic

What happens if Infura goes down?

We saw this in November 2020 (Infura outage). Entire Ethereum ecosystem froze for 6 hours:

  • MetaMask didn’t work (uses Infura by default)
  • dApps down (hardcoded Infura endpoints)
  • Even “decentralized” apps were centralized on one provider

Has anything changed since 2020?

Not really. More providers exist, but concentration hasn’t decreased.

Why?

Network effects: Once a provider is “default” in docs/tutorials, everyone uses them.

Example: Create React App used to have Infura endpoints hardcoded. Every tutorial used them. Millions of developers defaulted to Infura.

The Solution Nobody Wants to Hear

At Token2049, did anyone talk about RPC infrastructure as a public good?

Radical idea: Ethereum Foundation, Polygon Labs, etc. should fund RPC infrastructure directly.

How it could work:

Option 1: Foundation-run RPC endpoints

  • Ethereum Foundation operates 10+ global RPC nodes
  • Free for developers
  • Funded by foundation treasury

Option 2: Subsidized decentralized RPC

  • Protocol pays network of independent node operators
  • Developers access via decentralized routing layer
  • No single point of failure

Option 3: Protocol-level RPC incentives

  • Add RPC service to validator/sequencer duties
  • Validators earn additional rewards for serving RPC requests
  • Built into protocol economics

Why hasn’t this happened?

Political: Foundations don’t want to “pick winners” by subsidizing certain infrastructure providers

Economic: Running RPC at scale is EXPENSIVE. Who pays?

Technical: Decentralized RPC is hard (need routing, load balancing, consensus on responses)

My opinion: We need option 2 or 3. Current model is unsustainable.

The Archive Node Economics You Mentioned

This is the most broken part of the ecosystem.

Archive node costs (Ethereum mainnet):

  • Disk space: 12TB+ (growing ~2TB/year)
  • Hardware: NVMe RAID array ($5K+)
  • Network: 10Gbps+ interface
  • Bandwidth: 50TB+/month
  • Total monthly cost: $2-3K per node

What users expect to pay: $0 (free tier) or $50/month

The math doesn’t work.

What’s happening:

Providers are subsidizing archive access with VC money.

But VCs eventually want profits. When the subsidy ends, either:

  • Prices go up (users revolt)
  • Service quality goes down (providers cut costs)
  • Providers shut down (market consolidation)

My prediction: In 2-3 years, archive node access will be expensive. Free tiers will be rate-limited to uselessness.

The solution: Protocol-level incentives for archive node operators. But no one wants to tackle this.

What I’m Doing About It

After 18 months of pain, here’s our current strategy:

Primary RPC: Alchemy (yes, we pay)

  • Reliable, fast, multi-chain
  • Cost: $15K/month (we’re high volume)

Backup RPC: QuickNode

  • Automatic failover if Alchemy has issues
  • Cost: $8K/month

Self-hosted nodes: 3 nodes in key regions

  • Tertiary backup
  • Also used for internal testing/development
  • Cost: $10K/month

Total RPC cost: $33K/month

Before optimization: We were spending $45K/month (multiple providers, no coordination)

We saved $12K/month by consolidating and negotiating volume discounts.

Questions for @infra_hans

You mentioned operating RPC nodes. I’m curious:

1. What’s your architecture?

  • How many regions?
  • Load balancing strategy?
  • How do you handle failover?

2. What’s your biggest operational challenge?

  • For us: Keeping nodes synced (chain reorganizations are a nightmare)

3. How do you price your service?

  • We struggled with this when we tried to offer RPC to partners

4. Have you seen the dRPC approach (decentralized RPC routing)?

  • They claim 95+ blockchains, 5,000 RPS, 7 geo-distributed clusters
  • Curious if this model works at scale

For the Community

@infra_hans is right. This is a crisis nobody talks about.

Every time you interact with a blockchain dApp, you’re probably using:

  • MetaMask (Infura RPC)
  • Hardhat (Alchemy RPC in examples)
  • Your favorite dApp (likely Alchemy or QuickNode backend)

The entire “decentralized” ecosystem runs on 5 companies.

What happens when one gets hacked? Or regulatory pressure? Or acqui-hired by a competitor?

We need better answers.

Sources:

  • Our internal RPC infrastructure data (18 months of operations)
  • Infura outage November 2020 analysis
  • Bitcoin archive node cost analysis
  • dRPC technical specifications
  • Conversations with other protocol infrastructure teams at Token2049

Smart contract developer here. This RPC discussion is eye-opening because I never think about RPC infrastructure.

And that’s exactly the problem.

The Developer Perspective (Ignorance is Bliss?)

My typical development workflow:

  1. Write smart contracts in Solidity
  2. Test locally with Hardhat (uses Hardhat Network, no RPC needed)
  3. Deploy to testnet: Use whatever RPC is in the Hardhat tutorial (Alchemy)
  4. Deploy to mainnet: Copy the same Alchemy endpoint
  5. Ship dApp to production
  6. Never think about RPC again

Until users complain: “Your dApp is down!”

Me: “The smart contract is on-chain! How can it be down?”

Reality: RPC endpoint is down. dApp can’t read blockchain state. From user perspective: Down.

The Incident That Taught Me About RPC

3 months ago: Shipped a new DeFi feature. Users staking, earning rewards, everything great.

2am Sunday: Phone explodes. Users reporting: “Can’t unstake! Funds locked!”

Panic mode: Check smart contract. No bugs. Etherscan shows transactions failing.

Root cause (after 3 hours of debugging):

Our RPC provider (free tier) hit rate limit. Requests getting 429 errors.

Why Sunday 2am? That’s when our rewards distribution cron job runs. It makes 1000+ RPC calls to check all staker balances. Blew through rate limit.

Users thought: “Smart contract bug! My funds are locked!”

Reality: “RPC rate limit. Try again in an hour.”

But users don’t understand RPC. They just see: Can’t access funds. They panic.

The Things I Learned the Hard Way

Lesson 1: RPC is a single point of failure

Decentralization theater:

  • Smart contract: Decentralized :white_check_mark: (on-chain, immutable)
  • Frontend: Centralized :cross_mark: (hosted on Vercel)
  • RPC endpoint: Centralized :cross_mark::cross_mark: (single provider)

I thought I was building “decentralized finance.”

Reality: If Alchemy goes down, my “decentralized” app is centralized.

Lesson 2: Free tiers are not for production

What I used: Alchemy free tier (300M compute units/month)

What I needed: 500M+ compute units/month (our cron jobs are heavy)

What happened: Hit limit mid-month. dApp degraded. Users affected.

What I should have done: Pay for production tier from day 1.

Cost: $199/month (Growth plan)

How I justified NOT paying: “We’re a small team. Saving money.”

Cost of the incident: 3 engineers Ă— 8 hours debugging + user trust lost = Way more than $199/month

Lesson 3: RPC response times affect UX

@infra_hans mentioned sub-50ms requirements. I never measured this.

Then I instrumented my dApp:

Time breakdown for “Swap Tokens” transaction:

  • User clicks “Swap”: 0ms
  • Frontend fetches token balances (RPC call): 45ms
  • Frontend calculates slippage (local): 5ms
  • Frontend fetches gas price (RPC call): 35ms
  • User reviews transaction: (user time)
  • User clicks “Confirm”: 0ms
  • MetaMask signs transaction: 100ms
  • Transaction broadcast (RPC call): 25ms
  • Wait for confirmation (RPC polling): 50ms Ă— 15 polls = 750ms
  • UI updates (RPC call to fetch new balance): 40ms

Total RPC time: 895ms out of ~1000ms total

Insight: 90% of my dApp’s latency is RPC calls, not my code!

What @blockchain_brian said about P50/P95/P99 latency matters:

If RPC P99 latency is 450ms (his self-hosted nodes), then 1% of my users wait 2-3 seconds for a single operation. That’s terrible UX.

Lesson 4: Different RPC methods have different costs

I was making tons of eth_call requests (read contract state). Thought they were “free” because they don’t cost gas.

Reality: eth_call is EXPENSIVE for RPC providers.

Why?

  • Requires EVM execution (compute-heavy)
  • Might need historical state (archive node)
  • Can’t be cached easily (depends on block number)

Alchemy pricing:

  • eth_call: 26 compute units
  • eth_getBlockByNumber: 16 compute units
  • eth_getLogs: 75 compute units (!!)

My inefficient code:

I was calling eth_getLogs to find all Transfer events for a user. 75 compute units PER CALL.

Better approach: Use The Graph (indexes events) or better RPC caching.

After optimization: Reduced RPC calls by 60%. Saved money AND improved performance.

The Questions I Have After Reading This Thread

@infra_hans and @blockchain_brian, you both operate infrastructure. I have naive developer questions:

Question 1: What makes RPC so expensive?

I don’t get it. Reading blockchain data should be cheap, right? The data is already there.

Why does it cost $500K+/month (as mentioned in the panel)?

Is it:

  • Bandwidth? (serving lots of responses)
  • Compute? (executing eth_call requires EVM)
  • Storage? (archive nodes are huge)
  • Redundancy? (multiple regions for reliability)

Help me understand where the cost comes from.

Question 2: Why can’t protocols run their own RPC?

Ethereum Foundation has millions in funding. Why don’t they run free Ethereum RPC endpoints for everyone?

Same for Polygon, Arbitrum, etc.

Is it:

  • Too expensive?
  • Not their responsibility?
  • Political reasons?

Question 3: What should developers do?

Should we:

  • A) Always use commercial RPC providers (pay for reliability)
  • B) Run our own nodes (control our destiny)
  • C) Use decentralized RPC networks (dRPC, Pocket Network)
  • D) Multi-RPC strategy (redundant providers)

For context: Our dApp has 5K daily active users. Not huge, but not tiny.

Question 4: How do we make our dApps more resilient?

If RPC is a single point of failure, what’s the right architecture?

Should I:

  • Hardcode multiple RPC endpoints and failover?
  • Use a library that abstracts RPC and handles failover?
  • Accept that RPC centralization is reality and just pay for good providers?

What I’m Changing After This Discussion

Immediate actions:

:white_check_mark: Upgrade from free tier to paid plan ($199/month Alchemy Growth)

  • Should have done this 6 months ago

:white_check_mark: Add backup RPC provider (QuickNode as fallback)

  • Cost: $50/month for their Growth tier
  • Implement client-side failover

:white_check_mark: Instrument RPC performance (add monitoring)

  • Track P50/P95/P99 latency
  • Alert if latency spikes
  • Understand where time is spent

:white_check_mark: Optimize RPC calls (reduce compute units)

  • Cache aggressively
  • Batch requests where possible
  • Use The Graph for event queries

Medium-term:

:counterclockwise_arrows_button: Evaluate decentralized RPC (dRPC, Pocket Network)

  • Test performance and reliability
  • Compare costs
  • Decide if it makes sense for us

:counterclockwise_arrows_button: Build RPC abstraction layer

  • Don’t hardcode Alchemy everywhere
  • Make it easy to switch providers
  • Implement smart failover

The Bigger Picture

This thread made me realize: Infrastructure matters more than smart contract code.

I spend 80% of my time on:

  • Smart contract logic
  • Gas optimization
  • Security audits

I spend 5% of my time on:

  • RPC infrastructure
  • Frontend performance
  • Monitoring and alerting

But from user perspective:

  • Smart contract: They don’t see it (just works in background)
  • RPC/Frontend: They directly experience it (fast or slow, working or broken)

I’ve been optimizing the wrong things.

Questions for the Community

For other developers:

  • Do you think about RPC infrastructure when building?
  • Have you had RPC-related incidents?
  • What’s your RPC strategy?

For users:

  • When a dApp is slow, do you blame the dApp or your internet?
  • Do you even know what RPC is?

For @infra_hans and @blockchain_brian:

  • What advice would you give to developers who are just starting to think about RPC seriously?
  • What are the most common mistakes you see?

This Token2049 discussion is valuable. More devs need to understand the infrastructure layer.

Sources:

  • Personal experience (3 months ago RPC incident)
  • Alchemy compute unit pricing documentation
  • My dApp instrumentation data (RPC latency measurements)
  • The Graph protocol for event indexing
  • Discussions with other developers at Singapore Blockchain Week side events

Product manager at a blockchain startup here. This RPC infrastructure discussion explains SO MUCH about our user pain points.

The Product Manager’s Blind Spot

What I worry about daily:

  • User onboarding friction
  • Transaction fees (gas costs)
  • UI/UX polish
  • Feature requests
  • Competitive analysis

What I SHOULD worry about (but didn’t until now):

  • RPC infrastructure reliability
  • RPC latency impact on UX
  • RPC costs at scale
  • RPC provider lock-in

Why the blind spot?

Because when I ask engineering “Why is the app slow?”, they say “Network latency” or “Blockchain is slow.”

I assumed: This is a blockchain limitation. Nothing we can do.

Reality (after reading this thread): It’s often RPC infrastructure, and there’s LOTS we can do.

The User Complaints That Now Make Sense

Complaint #1: “App is slow to load”

User flow:

  1. Open app
  2. Connect wallet
  3. App fetches: Token balances, transaction history, NFTs, protocol state
  4. User sees dashboard

What I thought: “Blockchain is slow. Users need to be patient.”

What @dev_aisha revealed: 90% of latency is RPC calls.

What I should do:

  • Measure RPC latency specifically
  • Optimize number of RPC calls on page load
  • Implement caching/prefetching
  • Show loading states per section (not block entire UI)

Why this matters:

Our data (from analytics):

  • Average time to dashboard: 3.2 seconds
  • Users who see dashboard in <2 seconds: 78% retention
  • Users who wait >4 seconds: 45% retention

If we can reduce RPC latency from 3.2s to 2s, we gain 33% more retained users.

ROI of better RPC infrastructure: Massive.

Complaint #2: “Transaction failed but I don’t know why”

Common scenario:

  1. User initiates transaction
  2. MetaMask pops up
  3. User confirms
  4. App shows “Transaction pending…”
  5. 2 minutes later: “Transaction failed”
  6. User: “What happened? Why?”

What I thought: “Smart contract reverted. User needs better error messages.”

What @blockchain_brian explained: Could be RPC timeout, rate limit, or network issue.

Better UX:

Instead of generic “Transaction failed”, we should show:

  • “RPC timeout - trying backup provider…”
  • “Rate limit hit - retrying in 10 seconds…”
  • “Network congestion - estimated wait: 3 minutes”

Users can handle waiting. They can’t handle uncertainty.

Complaint #3: “App shows wrong balance”

This one was baffling. User says “I have 100 USDC but app shows 50 USDC.”

What I thought: “User error. They’re looking at wrong wallet.”

What actually happened (after debugging):

Our RPC endpoint was 3 blocks behind chain tip. User had recent transaction (2 blocks ago) that app didn’t see yet.

From user perspective: App is broken.

Solution @blockchain_brian mentioned: Query multiple RPC endpoints, use consensus.

Better solution: Show “Last updated: X seconds ago” so users understand data freshness.

The Business Impact Nobody Quantifies

Our RPC cost analysis (I did this after reading this thread):

Current spending:

  • Alchemy: $15K/month (our main RPC)
  • QuickNode: $5K/month (backup)
  • Total: $20K/month = $240K/year

User metrics:

  • DAU: 8,000
  • MAU: 25,000
  • RPC cost per DAU: $2.50/month
  • RPC cost per MAU: $0.80/month

For comparison:

  • AWS hosting: $8K/month ($1.00 per DAU)
  • Smart contract deployment: $50K one-time
  • Engineering salaries: $150K/month

RPC is our #2 operational cost after salaries.

But:

I never looked at RPC ROI until now.

What if we spent $30K/month on better RPC infrastructure?

Hypothesis: Faster load times → Better retention → More users → More revenue

Let’s model it:

Current:

  • 25K MAU
  • 78% retention (2-second load time benchmark)
  • RPC cost: $20K/month

Better RPC (optimized for speed):

  • 25K MAU initially
  • 85% retention (consistent sub-2-second loads)
  • Month 2: 26.75K MAU (7% growth from retention)
  • Month 6: 31K MAU (compound effect)
  • RPC cost: $30K/month

Revenue impact (assuming $5 ARPU):

  • Current: 25K Ă— $5 = $125K/month
  • Better RPC: 31K Ă— $5 = $155K/month
  • Net gain: $30K/month revenue - $10K/month extra RPC cost = +$20K/month

ROI: 200%

This is a no-brainer investment. Why haven’t we done it?

Because I didn’t understand RPC impact on retention until this thread.

The Product Decisions Affected by RPC

Reading this discussion, I realize several product decisions were constrained by RPC without me knowing:

Decision 1: Real-time updates

User request: “Show my balance updating in real-time as transactions confirm”

Engineering: “We’d need to poll RPC every 5 seconds. Too expensive.”

Product decision: We refresh only when user navigates to page.

What I didn’t know: Polling cost is driven by RPC pricing model (compute units per call).

Better solution (now that I understand):

  • Use WebSocket RPC endpoints (supported by Alchemy, cheaper than polling)
  • Or: Subscribe to events via The Graph
  • Or: Pay more for higher rate limits

We made product worse to save RPC costs I didn’t even know about.

Decision 2: Historical transaction views

User request: “Show all my transactions from past year”

Engineering: “eth_getLogs is expensive. We can only show last 30 days.”

Product decision: 30-day transaction history. Users complain.

What I didn’t know: eth_getLogs costs 75 compute units (as @dev_aisha mentioned). Large queries blow through rate limits.

Better solution:

  • Use The Graph (indexes events, much cheaper to query)
  • Or: Pay for higher tier with more compute units
  • Or: Implement pagination (load more as user scrolls)

Again: Product limitation due to RPC costs I wasn’t aware of.

Decision 3: Multi-chain support

User request: “Support Polygon, Arbitrum, Base, not just Ethereum”

Engineering: “Each chain needs separate RPC endpoints. Costs multiply.”

Product decision: Ethereum only for now.

What I didn’t know:

Better solution:

  • Use multi-chain RPC provider (one vendor, many chains)
  • Negotiate volume discount across all chains
  • Build once, deploy everywhere

We’re losing users to competitors who support more chains.

The Questions I’m Asking Engineering Now

After reading this thread, here’s my new checklist for every feature:

Before building:

  1. How many RPC calls does this feature require?

    • Load time impact?
    • Ongoing polling needs?
    • Compute unit costs?
  2. What happens if RPC is slow/down?

    • Graceful degradation?
    • Fallback to cached data?
    • User-facing error messages?
  3. Can we optimize RPC usage?

    • Batch requests?
    • Cache responses?
    • Use indexer (The Graph) instead?
  4. What’s the RPC cost at scale?

    • If feature is successful and 10x users adopt it?
    • Do we have budget?

These questions never existed in my PRD template before.

The Product Roadmap Changes

Features I’m now prioritizing:

P0: RPC reliability and monitoring

  • Implement multi-RPC failover (@blockchain_brian’s approach)
  • Add RPC performance monitoring (latency, success rate)
  • Alert engineering if RPC degrades

P0: RPC cost optimization

  • Audit all RPC calls (which are expensive?)
  • Implement caching layer
  • Use The Graph for event queries
  • Reduce load time from 3.2s to <2s

P1: Better user feedback for RPC issues

  • Show data freshness (“Updated 5 seconds ago”)
  • Better error messages (distinguish RPC timeout from contract error)
  • Loading states per component (not block entire UI)

P2: Multi-chain expansion

  • Now that I understand multi-chain RPC economics
  • Use provider like dRPC or Alchemy’s multi-chain offering
  • Ship Polygon and Arbitrum support

The Conversation We Need to Have with Users

Users don’t know what RPC is. Should we educate them?

Option 1: Abstract it completely

  • Never mention RPC
  • Just make it fast and reliable
  • Users don’t need to know

Option 2: Educate power users

  • Advanced settings: “Choose your RPC provider”
  • Some users want control (privacy, decentralization)
  • MetaMask does this (users can add custom RPC)

Option 3: Transparency when things break

  • If RPC is down, tell users “Our infrastructure provider is experiencing issues”
  • Don’t pretend it’s blockchain congestion when it’s RPC

I’m leaning toward Option 1 for general users, Option 2 for advanced settings.

My Questions for This Community

For @infra_hans:

You operate RPC infrastructure. From product perspective, what should we (customers) demand from RPC providers?

  • SLAs? (What’s reasonable: 99.9%, 99.99%?)
  • Latency guarantees?
  • Transparent status pages?
  • What else?

For @blockchain_brian:

You mentioned spending $33K/month on RPC. How do you justify this to your CFO/investors?

Is there a framework for “RPC as % of revenue” or “RPC cost per user” that makes sense?

For @dev_aisha:

As a developer, what do you wish product managers understood about RPC?

Where should we (product) be involved vs. leave it to engineering?

For everyone:

Should RPC cost be a product consideration? Or is this purely engineering ops?

The Token2049 Lesson

@infra_hans started this thread saying “The RPC Infrastructure Crisis Nobody’s Talking About”

As a product person, I wasn’t talking about it because I didn’t understand it.

Now I do. And it changes how I think about our product strategy.

Key insight: Infrastructure is product. Fast, reliable RPC = better UX = higher retention = more revenue.

We need to stop treating RPC as “boring DevOps” and start treating it as “core product infrastructure.”

Sources:

  • Our user analytics (retention data, load times)
  • Our RPC cost analysis (Alchemy + QuickNode spend)
  • User complaint analysis (support tickets)
  • Product metrics correlation (load time vs retention)
  • Conversations with product managers from other blockchain startups at Token2049 Week events