Implement a Thread-Safe Token Bucket Rate Limiter in Python

Compare model answers for this Coding benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

Coding

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

Google Gemini 2.5 Pro

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A Anthropic Claude Sonnet 4.6

Answer B OpenAI GPT-5.4

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.2 Anthropic Claude Opus 4.6 Google Gemini 2.5 Flash

Task Prompt

Show more ▼

Write a Python class named `TokenBucketRateLimiter` that implements the token bucket algorithm for rate limiting. The implementation must be thread-safe and should not use any external libraries for state management (like Redis). The class should have the following specifications: 1. An `__init__(self, capacity, refill_rate)` method: * `capacity`: The maximum number of tokens the bucket can hold. * `refill_rate`: The number of tokens that are added to the bucket per second. 2. A `consume(self, tokens)` method: * This method attempts to consume a given number of `tokens` from the bucket. * It should return `True` if the tokens can be consumed successfully, and `False` otherwise. * The bucket should be refilled with tokens based on the time elapsed since the last call before attempting to consume. 3. Thread Safety: * The class must be safe to use from multiple concurrent threads. All operations that modify the bucket's state (like refilling and consuming tokens) must be atomic. Provide the complete class implementation with necessary imports.

Task Context

The token bucket algorithm is a common method for rate limiting requests to a service. Imagine a bucket with a fixed capacity that is constantly being filled with tokens at a steady rate. Each incoming request must take a certain number of tokens from the bucket to be processed. If the bucket has enough tokens, the request is allowed, and the tokens are removed. If not, the request is rejected. This approach allows for bursts of requests up to the bucket's capacity, while enforcing an average rate over time. This...

Show more ▼

Judging Policy

Show more ▼

A high-quality answer must provide a correct and complete Python class implementation. The evaluation will focus on the following criteria: 1. **Correctness of Algorithm:** The implementation must accurately reflect the token bucket algorithm. It should correctly calculate token refills based on elapsed time and properly handle the bucket's capacity limit. 2. **Thread Safety:** The solution must be genuinely thread-safe. The use of appropriate synchronization primitives (like `threading.Lock`) to protect shared state is critical. Solutions with race conditions are incorrect. 3. **Code Quality:** The code should be clean, readable, and follow standard Python conventions (PEP 8). Comments should be used where necessary to explain complex logic. 4. **Robustness and Edge Cases:** The implementation should gracefully handle edge cases, such as a request to consume more tokens than the bucket's total capacity, consuming zero tokens, and the initial state of the bucket.

Answers

Answer A: Anthropic Claude Sonnet 4.6

Show Full Answer ▼

import threading import time class TokenBucketRateLimiter: def __init__(self, capacity, refill_rate): """ Initialize the Token Bucket Rate Limiter. :param capacity: Maximum number of tokens the bucket can hold. :param refill_rate: Number of tokens added to the bucket per second. """ self.capacity = capacity self.refill_rate = refill_rate self._tokens = capacity # Start with a full bucket self._last_refill_time = time.monotonic() self._lock = threading.Lock() def _refill(self): """ Refill the bucket based on the elapsed time since the last refill. This method must be called while holding the lock. """ now = time.monotonic() elapsed = now - self._last_refill_time tokens_to_add = elapsed * self.refill_rate self._tokens = min(self.capacity, self._tokens + tokens_to_add) self._last_refill_time = now def consume(self, tokens): """ Attempt to consume a given number of tokens from the bucket. :param tokens: Number of tokens to consume. :return: True if tokens were successfully consumed, False otherwise. """ if tokens <= 0: raise ValueError("Number of tokens to consume must be positive.") if tokens > self.capacity: # Can never be satisfied even with a full bucket return False with self._lock: self._refill() if self._tokens >= tokens: self._tokens -= tokens return True return False

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

Judge Models Google Gemini 2.5 Flash

Total Score

Overall Comments

Answer A provides a highly correct and well-structured implementation of the Token Bucket Rate Limiter. Its code quality is excellent, featuring clear docstrings and adherence to Python conventions. It correctly implements thread safety and handles specified edge cases explicitly, making it a robust and maintainable solution.

View Score Details ▼

Correctness

Weight 35%

The algorithm for refilling and consuming tokens is correctly implemented. The capacity limit is properly enforced, and the time-based refill logic is sound.

Completeness

Weight 20%

The answer provides a complete class implementation with all specified methods (`__init__`, `consume`) and necessary imports.

Code Quality

Weight 20%

The code is clean, readable, and follows standard Python conventions. It includes excellent docstrings for public methods and a clear comment for the private refill method, significantly enhancing maintainability.

Practical Value

Weight 15%

The implementation correctly uses `threading.Lock` to ensure thread safety, making it suitable for concurrent environments. It is a practically sound solution.

Instruction Following

Weight 10%

The solution explicitly handles all edge cases mentioned in the prompt's 'Robustness and Edge Cases' section: raising `ValueError` for non-positive `tokens` and returning `False` immediately if `tokens` exceeds `capacity`. The initial state is also correctly set.

Judge Models OpenAI GPT-5.2

Total Score

Overall Comments

Implements a correct token-bucket limiter with monotonic time, fractional refills, capacity capping, and proper locking around refill+consume. It also handles the important edge case of requests larger than capacity by returning False. Minor drawback: it raises on tokens<=0 (policy-dependent; some might prefer treating 0 as a no-op True) and it does not validate capacity/refill_rate inputs.

View Score Details ▼

Correctness

Weight 35%

Correct token-bucket refill based on elapsed monotonic time, caps at capacity, and performs refill+consume atomically under a lock. Uses fractional tokens appropriately.

Completeness

Weight 20%

Provides complete class with imports, internal refill helper, and consume method; includes explicit tokens>capacity handling. Missing validation for capacity/refill_rate values.

Code Quality

Weight 20%

Readable, PEP8-ish, with docstrings explaining behavior and locking expectations. Straightforward structure.

Practical Value

Weight 15%

Practical for typical use; immediate False for impossible large requests is helpful. Raising on tokens<=0 may be inconvenient in some integrations; no parameter validation for capacity/refill_rate.

Instruction Following

Weight 10%

Meets requirements: thread-safe, no external state libs, correct methods and behavior, and includes necessary imports.

Judge Models Anthropic Claude Opus 4.6

Total Score

Overall Comments

Answer A provides a correct, complete, and well-documented implementation of the token bucket rate limiter. It correctly implements the algorithm with proper thread safety using threading.Lock, handles edge cases like consuming more tokens than capacity (returns False) and consuming zero or negative tokens (raises ValueError). The code includes clear docstrings explaining each method's purpose and parameters. It uses time.monotonic() for reliable time tracking. The code is clean, readable, and follows Python conventions well.

View Score Details ▼

Correctness

Weight 35%

Correctly implements the token bucket algorithm with proper refill calculation, capacity capping, and atomic consume operations. Handles the edge case of requesting more tokens than capacity by returning False early. Raises ValueError for non-positive token requests.

Completeness

Weight 20%

Covers the required __init__, consume, and thread safety. Handles edge cases: zero/negative tokens (ValueError), tokens exceeding capacity (returns False). Starts with a full bucket. Includes all necessary imports.

Code Quality

Weight 20%

Well-documented with clear docstrings for the class and each method. Parameter descriptions are provided. Code is clean, readable, and follows PEP 8 conventions. The private _refill method is clearly noted as requiring the lock to be held.

Practical Value

Weight 15%

Ready to use in production scenarios. The early return for tokens exceeding capacity is a practical optimization. The ValueError for invalid input helps with debugging. Uses time.monotonic() for reliable timing.

Instruction Following

Weight 10%

Follows all instructions: implements the class with the specified name, __init__ with capacity and refill_rate, consume method returning True/False, thread safety with threading.Lock, and includes necessary imports. Provides complete class implementation as requested.

Answer B: OpenAI GPT-5.4

Show Full Answer ▼

import threading import time class TokenBucketRateLimiter: def __init__(self, capacity, refill_rate): if capacity <= 0: raise ValueError("capacity must be greater than 0") if refill_rate <= 0: raise ValueError("refill_rate must be greater than 0") self.capacity = float(capacity) self.refill_rate = float(refill_rate) self._tokens = float(capacity) self._last_refill_time = time.monotonic() self._lock = threading.Lock() def _refill(self): now = time.monotonic() elapsed = now - self._last_refill_time if elapsed > 0: added_tokens = elapsed * self.refill_rate self._tokens = min(self.capacity, self._tokens + added_tokens) self._last_refill_time = now def consume(self, tokens): if tokens <= 0: raise ValueError("tokens must be greater than 0") tokens = float(tokens) with self._lock: self._refill() if self._tokens >= tokens: self._tokens -= tokens return True return False

Result

Winning Votes

0 / 3

Average Score

Judge Models Google Gemini 2.5 Flash

Total Score

Overall Comments

Answer B offers a functionally correct implementation with some valuable robustness enhancements, such as initial parameter validation and the use of floats for precision. However, its significant drawback is the lack of docstrings for its methods, which severely impacts its code quality, readability, and maintainability.

View Score Details ▼

Correctness

Weight 35%

The core algorithm is correctly implemented. Using floats for token counts and rates provides better precision, which is a slight improvement for correctness in a rate limiting context.

Completeness

Weight 20%

The answer provides a complete class implementation with all specified methods (`__init__`, `consume`) and necessary imports.

Code Quality

Weight 20%

The code lacks docstrings for its methods, which significantly reduces its readability and maintainability. While variable names are clear, the absence of documentation is a major drawback.

Practical Value

Weight 15%

The implementation is thread-safe using `threading.Lock`. Additionally, the initial validation for `capacity` and `refill_rate` being positive, along with the use of floats for precision, significantly enhances its practical robustness.

Instruction Following

Weight 10%

The solution handles `tokens <= 0` by raising `ValueError` and correctly sets the initial state. It implicitly handles consuming more tokens than capacity, but lacks an explicit early exit like Answer A. The `__init__` validation is a good addition but not explicitly listed in the prompt's edge case examples.

Judge Models OpenAI GPT-5.2

Total Score

Overall Comments

Implements the core token-bucket logic with monotonic time and proper locking around refill+consume, and adds helpful validation for capacity and refill_rate. However, it misses the key edge case where a request exceeds total capacity; such a request should immediately return False, but this implementation will instead always return False without explicitly documenting/handling it (and will keep refilling/locking each time). It also raises on tokens<=0 (again policy-dependent).

View Score Details ▼

Correctness

Weight 35%

Core refill and atomic consume under lock are correct and use monotonic time with capacity capping. Lacks explicit handling for tokens > capacity, which is an expected edge case in the prompt; behavior is implicitly always-false but not handled/short-circuited.

Completeness

Weight 20%

Complete class with imports and helper methods and includes validation for capacity/refill_rate. Missing the prompt-mentioned edge case handling for consuming more than total capacity.

Code Quality

Weight 20%

Clean and concise, with clear validation and consistent use of floats. Fewer comments/docstrings than A but still readable and well structured.

Practical Value

Weight 15%

Practical validations for capacity/refill_rate improve safety. Missing explicit oversized-request handling and raising on tokens<=0 reduce ergonomics slightly.

Instruction Following

Weight 10%

Meets core requirements and includes necessary imports and thread safety. Falls short on the explicitly called-out edge case (consume more than total capacity) in the task description.

Judge Models Anthropic Claude Opus 4.6

Total Score

Overall Comments

Answer B provides a correct implementation with proper thread safety. It adds input validation for the constructor parameters (capacity and refill_rate must be positive), which is a nice robustness touch. It explicitly converts values to float, which adds some type safety. However, it lacks docstrings and comments, reducing readability. It also does not handle the edge case where tokens requested exceed capacity (it will always return False in practice, but doesn't short-circuit before acquiring the lock). The elapsed > 0 check in _refill is a minor optimization but not strictly necessary.

View Score Details ▼

Correctness

Weight 35%

Correctly implements the token bucket algorithm with proper refill and consume logic. Thread safety is correctly implemented. However, it does not handle the case where tokens exceed capacity as an early return, meaning it will still acquire the lock and attempt refill before inevitably returning False. The elapsed > 0 guard is a minor but unnecessary optimization.

Completeness

Weight 20%

Covers the required __init__, consume, and thread safety. Adds constructor validation for capacity and refill_rate being positive, which is a good completeness addition. Handles zero/negative token consumption. Includes all necessary imports. However, lacks the explicit capacity-exceeding check in consume.

Code Quality

Weight 20%

Lacks any docstrings or comments. While the code itself is clean and readable, the absence of documentation significantly reduces code quality. The explicit float conversions add some type safety but without explanation of why they are done.

Practical Value

Weight 15%

Ready to use in production scenarios. Constructor validation prevents misconfiguration with invalid capacity or refill_rate values, which is practically valuable. Explicit float conversion ensures consistent arithmetic. Uses time.monotonic() for reliable timing.

Instruction Following

Weight 10%

Follows the core instructions: implements the class with correct name, __init__ and consume methods, thread safety with Lock, and includes imports. However, the task mentions handling consuming zero tokens as an edge case, and Answer B raises an error for it rather than gracefully handling it. Also lacks the documentation that 'complete implementation' implies.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Winner Anthropic Claude Sonnet 4.6

Winning Votes

3 / 3

Average Score

View this answer

OpenAI GPT-5.4

Winning Votes

0 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models Anthropic Claude Opus 4.6

Claude Sonnet 4.6 Winner 73

GPT-5.4 67

Why This Side Won

Answer A wins primarily due to better code quality (docstrings, comments, clear documentation) and slightly better edge case handling (explicitly returning False when tokens exceed capacity without needing to acquire the lock). Both answers are correct in their core algorithm and thread safety. Answer A's superior documentation and the capacity check edge case give it an edge, particularly on the code quality criterion which has significant weight. Answer B's constructor validation is a nice touch but doesn't compensate for the lack of documentation.

Judge Models OpenAI GPT-5.2

Claude Sonnet 4.6 Winner 79

GPT-5.4 76

Why This Side Won

Answer A wins because it more fully matches token-bucket expectations and edge-case handling by explicitly rejecting consumes larger than the bucket capacity, while maintaining correct, thread-safe refill and consume behavior. Answer B is close but less complete/robust with respect to the specified edge cases, which weighs heavily under correctness and completeness.

Judge Models Google Gemini 2.5 Flash

Claude Sonnet 4.6 Winner 89

GPT-5.4 86

Why This Side Won

Answer A wins primarily due to its superior code quality, which includes comprehensive docstrings and clear, readable code. While Answer B offers some valuable robustness features like initial parameter validation and float precision, its lack of documentation significantly detracts from its overall quality and maintainability, especially given the weight of the 'code_quality' criterion. Answer A also explicitly handles all specified edge cases, contributing to its strong instruction following score.

Implement a Thread-Safe Token Bucket Rate Limiter in Python

Task Overview

Task Prompt

Answers

Answer A: Anthropic Claude Sonnet 4.6

Answer B: OpenAI GPT-5.4

Comparison Summary

Judging Results

Related Tasks

Urban Transit Policy Analysis

Responding to an Upset Community Member

Internal Announcement for New Mentorship Program

1940s Private Eye Tackles a Modern Mystery

Implement a Dependency Resolver in Python

Persuasive Speech for a Community Garden

Community Garden Project Plan

Crafting an Empathetic Message for a Disappointed Friend

Related Links