← Back to all articles

Rate Limits Are Not Reliability

#infrastructure #performance #reliability #rpc

Why Rate Limits Exist

Rate limits are usually introduced with good intentions. They promise fairness, protection against abuse, and predictable system behavior. Limiting how many requests a client can make seems like a reasonable way to prevent overload.

In early systems, this often works well enough. Traffic is low, usage patterns are simple, and the system has plenty of headroom.

Problems begin when traffic becomes real.

At scale, rate limits stop being a safety mechanism and start acting as a blunt instrument. They don’t manage load — they respond to it. And by the time they respond, the system is often already under stress.


Reliability Is About Predictability Under Pressure

Reliability is not about blocking traffic once things go wrong.
It is about remaining predictable as pressure increases.

A reliable system answers three questions continuously:

  • How much load can I safely handle right now?
  • Which requests are more expensive than others?
  • How do I slow traffic before instability spreads?

Rate limits answer none of these. They apply static thresholds to a dynamic system and assume all requests are equal. In production, neither assumption holds.


Request Count Is a Terrible Proxy for Load

One of the core flaws of rate limiting is that it measures the wrong thing.

RPC requests vary enormously in cost:

  • a cached read vs a state-heavy query
  • a simple balance lookup vs a multi-hop call
  • a local node hit vs a cold upstream request

Counting requests per second ignores this entirely.

Two clients making the same number of requests can impose vastly different load on the system. When expensive requests dominate, rate limits trigger too late. When cheap requests dominate, they trigger too early.

Either way, the system behaves unpredictably.


The Retry Amplification Trap

Rate limits don’t exist in isolation. They interact with client behavior — and that interaction is where many systems fail.

When a client receives a 429 response:

  • SDKs often retry automatically
  • applications retry with backoff
  • some clients retry aggressively
  • some don’t respect backoff at all

The rejected request doesn’t disappear. It becomes future load.

This is retry amplification: one rejected request turns into multiple delayed requests, often arriving in bursts. Instead of smoothing traffic, the system experiences repeated waves of pressure.

Under sustained load, this behavior can turn a short spike into a prolonged outage.


When Rate Limits Hide the Real Problem

Another issue with rate limits is observability.

Once 429s start appearing, they mask what’s actually failing underneath:

  • rising tail latency
  • queue saturation
  • partial node degradation
  • uneven load distribution

From the outside, everything looks “controlled”. From the inside, the system is struggling.

Operators see fewer requests hitting the core systems, but that doesn’t mean the system is healthy. It often means the system is shielded from the very signals needed to recover correctly.


Static Limits in a Dynamic World

Most rate limits are configured statically:

  • X requests per second
  • Y per minute
  • Z concurrent connections

Production traffic is anything but static.

It includes:

  • sudden bursts
  • time-based patterns
  • uneven regional demand
  • cache invalidation storms
  • dependency slowdowns

A static limit cannot adapt to these conditions. It treats every second as identical, even when system capacity changes minute by minute.

This mismatch is why rate limits frequently oscillate between “too permissive” and “too strict”, with no stable middle ground.


How Rate Limiting Can Worsen Outages

In many real incidents, rate limiting is not the initial cause of failure — but it accelerates the collapse.

A common pattern looks like this:

  1. Downstream dependency slows down
  2. Latency increases
  3. Queues grow
  4. Rate limits start rejecting traffic
  5. Clients retry
  6. Load increases again
  7. Recovery takes longer than the original failure

At this stage, the system is no longer serving traffic deterministically. Some requests succeed, others fail, and behavior varies by client and timing.

This is not graceful degradation. It is instability with guardrails.


What Reliable Systems Do Differently

Reliable systems focus on controlling load before rejection becomes necessary.

That means:

  • understanding request cost, not just volume
  • applying backpressure instead of hard failures
  • adapting limits based on real-time capacity
  • prioritizing critical traffic
  • degrading functionality intentionally

Rate limits may still exist, but they are not the primary control mechanism. They are a last line of defense, not the foundation.

The goal is to slow traffic early and smoothly, not to drop it suddenly.


Why This Matters Even More for RPC Infrastructure

RPC systems amplify all of these issues.

They sit directly in the request path of applications. Latency changes application behavior. Retries cascade across services. A single overloaded endpoint can affect thousands of clients simultaneously.

Treating RPC traffic as uniform and enforcing static rate limits is a recipe for unpredictable failures. The system might survive light load, but it will behave erratically under real-world conditions.

Reliable RPC infrastructure is designed with the assumption that:

  • retries will happen
  • traffic will spike
  • partial failures are normal
  • not all requests are equal

Anything less is optimism disguised as engineering.


Closing Thoughts

Rate limits are easy to explain and easy to implement. That’s why they are so often mistaken for reliability features.

In reality, they are a symptom-oriented response to a deeper problem: lack of load awareness and control.

If a system relies on rate limits to stay healthy, it is already operating too close to its limits. True reliability comes from shaping traffic proactively, understanding cost, and designing systems that remain predictable under stress — not from rejecting requests once it’s too late.

In production, reliability is not about saying “no”.
It’s about staying in control.

See also

Ready when you are

Start using RVO today

Create a free API key and send requests immediately. Limits are explicit, upgrades are instant, and nothing is hidden.