Viewed
Coding
Google
Gemini 2.5 Flash-Lite
VS
OpenAI
GPT-5 mini
Implement a Concurrent Rate Limiter with Sliding Window and Priority Queues
Design and implement a thread-safe rate limiter in Python that supports the following features:
1. **Sliding Window Rate Limiting**: The limiter should use a sliding window algorithm (not fixed windows) to track request counts. Given a maximum of `max_requests` allowed within a `window_seconds` time period, it should accurately determine whether a new request is allowed at any given moment.
2. **Multiple Tiers**: The rate limiter must support multiple named tiers (e.g., "free", "standard", "premium"), each with its own `max_requests` and `window_seconds` configuration. Clients are assigned a tier upon registration.
3. **Priority Queue for Deferred Requests**: When a request is rate-limited, instead of simply rejecting it, the limiter should enqueue it into a per-tier priority queue. Each request has an integer priority (lower number = higher priority). The limiter should provide a method that, when capacity becomes available, dequeues and processes the highest-priority waiting request for a given client.
4. **Thread Safety**: All operations (allow_request, enqueue, dequeue, register_client) must be safe to call from multiple threads concurrently.
5. **Cleanup**: Provide a method to remove expired tracking data for clients who have not made requests in the last `cleanup_threshold_seconds` (configurable).
Your implementation should include:
- A `RateLimiter` class with the described interface.
- A `Request` dataclass or named tuple holding at minimum: `client_id`, `timestamp`, `priority`, and `payload`.
- Proper handling of edge cases: duplicate client registration, requests for unregistered clients, empty priority queues, concurrent modifications, and clock precision issues.
Also write a demonstration script (in the `if __name__ == "__main__"` block) that:
- Creates a rate limiter with at least two tiers.
- Registers several clients.
- Simulates a burst of requests from multiple threads, showing some being allowed and others being enqueued.
- Shows deferred requests being processed when capacity frees up.
- Prints clear output showing the sequence of events.
Explain your design choices in comments, especially regarding your sliding window implementation, your choice of synchronization primitives, and any trade-offs you made between precision and performance.