Viewed
NEW
Coding
OpenAI
GPT-5.5
VS
Google
Gemini 2.5 Flash
Rate Limiter with Sliding Window and Burst Allowance
Design and implement a thread-safe rate limiter in a language of your choice (Python, Go, Java, TypeScript, or Rust) that supports the following requirements:
1. **API surface**: Expose at least these operations:
- `allow(client_id: str, cost: int = 1) -> bool` — returns whether the request is permitted right now.
- `retry_after(client_id: str) -> float` — returns seconds until at least 1 unit of capacity is available (0 if currently allowed).
- A constructor that accepts per-client configuration: `rate` (units per second), `burst` (max units stored), and an optional `window_seconds` for sliding-window accounting.
2. **Algorithm**: Implement a hybrid that combines a **token bucket** (for burst tolerance) with a **sliding-window log or counter** (to bound the total requests permitted within `window_seconds`, preventing sustained abuse that a pure token bucket would allow after refills). A request is permitted only if both checks pass. Justify your data-structure choice for the sliding window (exact log vs. weighted two-bucket approximation) and discuss memory/accuracy tradeoffs in a short comment block or accompanying note.
3. **Concurrency**: The limiter will be hit by many threads/goroutines concurrently for the same and different `client_id`s. Avoid a single global lock becoming a bottleneck (e.g., per-client locks or lock striping). Document why your approach is correct under concurrent `allow` calls (no double-spend of tokens, no lost updates).
4. **Time source**: Make the clock injectable so tests are deterministic. Use a monotonic clock by default.
5. **Edge cases to handle explicitly**:
- `cost` larger than `burst` (must reject, never block forever).
- Clock going backwards or large pauses (e.g., suspended VM): clamp rather than crash, and don't grant unbounded tokens.
- First-ever request for a new client (lazy initialization).
- Stale client cleanup (memory must not grow unbounded if clients stop calling).
- Fractional tokens / sub-millisecond timing.
6. **Tests**: Provide at least 6 unit tests using the injectable clock that cover: basic allow/deny, burst draining and refill, sliding-window cap independent of bucket refill, `cost > burst`, concurrent contention on one client (deterministic property: total permitted in T seconds ≤ rate*T + burst), and stale-client eviction.
7. **Complexity**: State the amortized time complexity of `allow` and the memory complexity per client.
Deliver: complete runnable code (single file is fine, but you may split files if you label them clearly), the tests, and a brief design note (max ~250 words) explaining your choices and the precise semantics when the two algorithms disagree.