Rate Limits Are Not Reliability
Why Rate Limits Exist
Rate limits are usually introduced with good intentions. They promise fairness, protection against abuse, and predictable system behavior. Limiting how many requests a client can make seems like a reasonable way to prevent overload.
In early systems, this often works well enough. Traffic is low, usage patterns are simple, and the system has plenty of headroom.
Problems begin when traffic becomes real.
At scale, rate limits stop being a safety mechanism and start acting as a blunt instrument. They don’t manage load — they respond to it. And by the time they respond, the system is often already under stress.
Reliability Is About Predictability Under Pressure
Reliability is not about blocking traffic once things go wrong.
It is about remaining predictable as pressure increases.
A reliable system answers three questions continuously:
- How much load can I safely handle right now?
- Which requests are more expensive than others?
- How do I slow traffic before instability spreads?
Rate limits answer none of these. They apply static thresholds to a dynamic system and assume all requests are equal. In production, neither assumption holds.
Request Count Is a Terrible Proxy for Load
One of the core flaws of rate limiting is that it measures the wrong thing.
RPC requests vary enormously in cost:
- a cached read vs a state-heavy query
- a simple balance lookup vs a multi-hop call
- a local node hit vs a cold upstream request
Counting requests per second ignores this entirely.
Two clients making the same number of requests can impose vastly different load on the system. When expensive requests dominate, rate limits trigger too late. When cheap requests dominate, they trigger too early.
Either way, the system behaves unpredictably.
The Retry Amplification Trap
Rate limits don’t exist in isolation. They interact with client behavior — and that interaction is where many systems fail.
When a client receives a 429 response:
- SDKs often retry automatically
- applications retry with backoff
- some clients retry aggressively
- some don’t respect backoff at all
The rejected request doesn’t disappear. It becomes future load.
This is retry amplification: one rejected request turns into multiple delayed requests, often arriving in bursts. Instead of smoothing traffic, the system experiences repeated waves of pressure.
Under sustained load, this behavior can turn a short spike into a prolonged outage.
When Rate Limits Hide the Real Problem
Another issue with rate limits is observability.
Once 429s start appearing, they mask what’s actually failing underneath:
- rising tail latency
- queue saturation
- partial node degradation
- uneven load distribution
From the outside, everything looks “controlled”. From the inside, the system is struggling.
Operators see fewer requests hitting the core systems, but that doesn’t mean the system is healthy. It often means the system is shielded from the very signals needed to recover correctly.
Static Limits in a Dynamic World
Most rate limits are configured statically:
- X requests per second
- Y per minute
- Z concurrent connections
Production traffic is anything but static.
It includes:
- sudden bursts
- time-based patterns
- uneven regional demand
- cache invalidation storms
- dependency slowdowns
A static limit cannot adapt to these conditions. It treats every second as identical, even when system capacity changes minute by minute.
This mismatch is why rate limits frequently oscillate between “too permissive” and “too strict”, with no stable middle ground.
How Rate Limiting Can Worsen Outages
In many real incidents, rate limiting is not the initial cause of failure — but it accelerates the collapse.
A common pattern looks like this:
- Downstream dependency slows down
- Latency increases
- Queues grow
- Rate limits start rejecting traffic
- Clients retry
- Load increases again
- Recovery takes longer than the original failure
At this stage, the system is no longer serving traffic deterministically. Some requests succeed, others fail, and behavior varies by client and timing.
This is not graceful degradation. It is instability with guardrails.
What Reliable Systems Do Differently
Reliable systems focus on controlling load before rejection becomes necessary.
That means:
- understanding request cost, not just volume
- applying backpressure instead of hard failures
- adapting limits based on real-time capacity
- prioritizing critical traffic
- degrading functionality intentionally
Rate limits may still exist, but they are not the primary control mechanism. They are a last line of defense, not the foundation.
The goal is to slow traffic early and smoothly, not to drop it suddenly.
Why This Matters Even More for RPC Infrastructure
RPC systems amplify all of these issues.
They sit directly in the request path of applications. Latency changes application behavior. Retries cascade across services. A single overloaded endpoint can affect thousands of clients simultaneously.
Treating RPC traffic as uniform and enforcing static rate limits is a recipe for unpredictable failures. The system might survive light load, but it will behave erratically under real-world conditions.
Reliable RPC infrastructure is designed with the assumption that:
- retries will happen
- traffic will spike
- partial failures are normal
- not all requests are equal
Anything less is optimism disguised as engineering.
Closing Thoughts
Rate limits are easy to explain and easy to implement. That’s why they are so often mistaken for reliability features.
In reality, they are a symptom-oriented response to a deeper problem: lack of load awareness and control.
If a system relies on rate limits to stay healthy, it is already operating too close to its limits. True reliability comes from shaping traffic proactively, understanding cost, and designing systems that remain predictable under stress — not from rejecting requests once it’s too late.
In production, reliability is not about saying “no”.
It’s about staying in control.
See also
Designing a Production-Grade RPC Failover Layer
Adding multiple RPC endpoints is easy. Designing a production-grade failover layer with health scoring, stale node detection, latency tracking, and circuit breaking is not. This article breaks down what it actually takes.
Tracing a Web3 Request End-to-End: Where Latency and Failure Actually Come From
RPC performance issues rarely originate at the node itself. Latency, inconsistency, and failure are introduced across a chain of systems long before a request reaches a validator. This article traces a Web3 request end-to-end to show where delays accumulate, errors are masked, and reliability quietly degrades.
How to Benchmark RPC Providers Correctly
Most RPC benchmarks measure the wrong things. Average latency and request rates often hide degradation, throttling, and stale state that only appear under real load. This article explains how to benchmark RPC providers correctly—focusing on reliability, consistency, and behavior under stress, not just speed.
