Single provider dependencies are a security vulnerability. Here’s how to build resilient RPC infrastructure.
The Risk Model
Single Provider Failure Modes:
- Service outage - Provider goes down, you go down
- Rate limiting - Traffic spike exhausts your quota
- Policy change - Provider decides to ban your use case
- Pricing change - Costs suddenly become prohibitive
- Security breach - Provider’s infrastructure is compromised
- Regulatory action - Provider forced to block certain users/regions
Any of these can happen without warning.
The Multi-Provider Architecture
┌─────────────────┐
│ Load Balancer │
│ (Smart Router) │
└────────┬────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Primary │ │Secondary│ │ Fallback│
│ Alchemy │ │QuickNode│ │ dRPC │
└─────────┘ └─────────┘ └─────────┘
Smart Routing Logic
class MultiProviderRPC {
async call(method, params) {
// Try primary
try {
return await this.primary.call(method, params);
} catch (e) {
this.metrics.primaryFail++;
}
// Failover to secondary
try {
return await this.secondary.call(method, params);
} catch (e) {
this.metrics.secondaryFail++;
}
// Last resort fallback
return await this.fallback.call(method, params);
}
}
Provider Selection Criteria
| Role |
Optimize For |
Example |
| Primary |
Performance + Features |
QuickNode, Alchemy |
| Secondary |
Different infrastructure |
Different provider |
| Fallback |
Availability + Decentralization |
dRPC, Pocket |
Critical: Use Different Providers
Don’t use Alchemy primary + Alchemy secondary. When Alchemy has issues, both fail.
Implementation Patterns
1. Health Checks
Ping all providers every 30 seconds. Route away from unhealthy ones.
2. Latency-Based Routing
Track P50/P99 latency. Shift traffic to fastest healthy provider.
3. Quota Management
Track usage against limits. Preemptively failover before hitting caps.
4. Response Verification
Cross-check critical responses against multiple providers.
The Cost of Resilience
Multi-provider adds ~50-100% to RPC costs. But what’s the cost of 4 hours downtime?
For a $50M TVL protocol, even 0.1% of TVL at risk during downtime = $50K.
The insurance math is clear.
Great framework. Let me add implementation details from running this in production.
Our Monitoring Stack
Multi-provider is useless without proper monitoring:
# Prometheus metrics we track
rpc_request_total{provider, method, status}
rpc_request_latency_ms{provider, method, quantile}
rpc_provider_health{provider}
rpc_failover_total{from_provider, to_provider}
Alert Thresholds
| Metric |
Warning |
Critical |
Action |
| Error rate |
1% |
5% |
Failover |
| P99 latency |
500ms |
1000ms |
Investigate |
| Quota usage |
80% |
95% |
Failover |
| Health check |
2 failures |
3 failures |
Remove from pool |
The Dashboard
Every 30 seconds we check:
- Each provider responds to
eth_blockNumber
- Response time < 500ms
- Block number is within 2 of highest seen
Providers failing 3 consecutive checks get removed from rotation.
Failover Testing
We run chaos engineering monthly:
- Randomly disable primary provider
- Verify failover works
- Measure failover latency
- Check for dropped requests
Critical learning: Test failover regularly. Untested failover is not failover.
Recovery
After provider recovers:
- Wait for 5 consecutive health checks
- Add back at 10% traffic
- Gradually increase over 30 minutes
- Full traffic only if latency normalizes
Never slam full traffic back to a recovering provider.
This is great but let’s talk about cost-effective multi-provider for startups.
Budget Multi-Provider Stack
| Tier |
Provider |
Cost |
Use Case |
| Primary |
Alchemy Free |
$0 |
30M CU/month |
| Secondary |
QuickNode Build |
$49 |
80M credits |
| Fallback |
Pocket Free |
$0 |
100K/day |
Total: $49/month for multi-provider resilience.
The Catch
Free tiers have limits:
- Alchemy: 25 RPS
- Pocket: Daily caps
- QuickNode: Credit limits
For low-traffic apps, this works. For growth, you’ll outgrow it.
Simple Failover Without Complexity
import { ethers } from 'ethers';
const providers = [
new ethers.JsonRpcProvider(process.env.ALCHEMY_URL),
new ethers.JsonRpcProvider(process.env.QUICKNODE_URL),
new ethers.JsonRpcProvider(process.env.POCKET_URL),
];
async function resilientCall(method, params) {
for (const provider of providers) {
try {
return await provider.send(method, params);
} catch (e) {
console.log(`Provider failed, trying next...`);
}
}
throw new Error('All providers failed');
}
20 lines of code, meaningful resilience improvement.
When to Upgrade
Invest in proper multi-provider infrastructure when:
- Traffic exceeds free tier limits
- You have SLA requirements
- Downtime has measurable cost
Until then, simple failover chains work fine.
SDK and library support for failover has gotten much better. Here’s the current state.
Ethers.js v6
import { FallbackProvider, JsonRpcProvider } from 'ethers';
const provider = new FallbackProvider([
{ provider: new JsonRpcProvider(ALCHEMY_URL), priority: 1, stallTimeout: 2000 },
{ provider: new JsonRpcProvider(QUICKNODE_URL), priority: 2, stallTimeout: 2000 },
{ provider: new JsonRpcProvider(POCKET_URL), priority: 3, stallTimeout: 3000 },
]);
FallbackProvider handles:
- Automatic failover on errors
- Stall timeout for slow providers
- Priority-based routing
- Quorum checking for critical calls
Viem
import { createClient, fallback, http } from 'viem';
const client = createClient({
transport: fallback([
http(ALCHEMY_URL),
http(QUICKNODE_URL),
http(POCKET_URL),
]),
});
Viem’s fallback transport is even simpler.
web3.js
Unfortunately, web3.js doesn’t have built-in fallback. You need wrapper code.
My Recommendations
- Use Viem or Ethers v6 - Built-in failover is battle-tested
- Set appropriate timeouts - Default timeouts are often too long
- Configure quorum for writes - Multiple provider confirmation for transactions
- Don’t reinvent the wheel - Library fallback is better than custom code
The Gotcha
None of these handle provider-specific APIs. If you’re using Alchemy’s enhanced endpoints, there’s no automatic fallback to QuickNode’s equivalent.
Stick to standard JSON-RPC for the fallback-able operations.