Implement a Token Bucket Rate Limiter Class

Compare model answers for this Coding benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

Coding

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

Google Gemini 2.5 Pro

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A Anthropic Claude Opus 4.6

Answer B OpenAI GPT-5 mini

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.4 Anthropic Claude Sonnet 4.6 Google Gemini 2.5 Pro

Task Prompt

Show more ▼

Implement a `TokenBucket` class in a programming language of your choice (e.g., Python, JavaScript, Java, C++). The class should have: 1. A constructor that accepts two arguments: - `capacity`: An integer representing the maximum number of tokens the bucket can hold. - `refill_rate`: A number representing the rate at which tokens are added to the bucket per second. 2. A method `allow_request()`: - This method takes no arguments. - It should return `True` if a request is allowed (i.e., there is at least one token in the bucket), and `False` otherwise. - If a request is allowed, it should consume one token from the bucket. Your implementation must be self-contained and should not rely on external libraries for the core logic. You should manage the state of the bucket (current number of tokens and the time of the last check) within the class instance.

Task Context

Show more ▼

The token bucket algorithm is a common method for rate limiting. It works as follows: - The bucket has a fixed `capacity`. - Tokens are added to the bucket at a constant `refill_rate`. - If the bucket is full, any new tokens that are added are discarded. - Each incoming request attempts to consume one token from the bucket. - If a token is available, the request is allowed, and the token count is decremented. - If no tokens are available, the request is denied. This algorithm allows for bursts of requests up to the bucket's `capacity`, and then throttles requests to the `refill_rate` over the long term. Your implementation should accurately calculate the number of tokens to add based on the time elapsed since the last request was processed.

Judging Policy

Show more ▼

A high-quality answer will provide a correct, robust, and well-written implementation of the `TokenBucket` class. - **Correctness:** The core logic must be implemented correctly. This includes accurately calculating the number of tokens to add based on the elapsed time, ensuring the token count never exceeds the capacity, and correctly consuming tokens for allowed requests. - **Robustness:** The solution should handle edge cases, such as the first request ever made, multiple requests happening in very quick succession, and long periods of inactivity between requests. The calculation for token refills should be precise and not suffer from floating-point inaccuracies that accumulate over time. - **Code Quality:** The code should be clean, readable, and idiomatic for the chosen language. It should be well-structured, with clear variable names and comments where the logic is non-obvious. - **State Management:** The implementation must correctly manage its internal state (e.g., the number of tokens and the timestamp of the last update) across multiple calls to `allow_request()`.

Answers

Answer A: Anthropic Claude Opus 4.6

import time

class TokenBucket:
"""A token bucket rate limiter.

Tokens are added at a constant `refill_rate` (tokens per second) up to
`capacity`.  Each call to `allow_request()` tries to consume one token.
"""

def __init__(self, capacity: int, refill_rate: float) -> None:
    """
    Args:
        capacity: Maximum number of tokens the bucket can hold.
        refill_rate: Number of tokens added to the bucket per second.
    """
    self.capacity: int = capacity
    self.refill_rate: float = refill_rate
    self.tokens: float = float(capacity)  # start full...

Show Full Answer ▼

import time

class TokenBucket:
"""A token bucket rate limiter.

Tokens are added at a constant `refill_rate` (tokens per second) up to
`capacity`.  Each call to `allow_request()` tries to consume one token.
"""

def __init__(self, capacity: int, refill_rate: float) -> None:
    """
    Args:
        capacity: Maximum number of tokens the bucket can hold.
        refill_rate: Number of tokens added to the bucket per second.
    """
    self.capacity: int = capacity
    self.refill_rate: float = refill_rate
    self.tokens: float = float(capacity)  # start full
    self.last_refill_time: float = time.monotonic()

def _refill(self) -> None:
    """Add tokens based on elapsed time since the last refill."""
    now = time.monotonic()
    elapsed = now - self.last_refill_time
    tokens_to_add = elapsed * self.refill_rate
    self.tokens = min(self.capacity, self.tokens + tokens_to_add)
    self.last_refill_time = now

def allow_request(self) -> bool:
    """Attempt to consume one token.

    Returns:
        True if the request is allowed (a token was available and consumed),
        False otherwise.
    """
    self._refill()
    if self.tokens >= 1:
        self.tokens -= 1
        return True
    return False

---------------------------------------------------------------------------

Quick demo / smoke test

---------------------------------------------------------------------------

if name == "main":
bucket = TokenBucket(capacity=5, refill_rate=2) # 5 max, 2 tokens/sec

# Burst: should allow 5 requests immediately
for i in range(7):
    result = bucket.allow_request()
    print(f"Request {i + 1}: {'allowed' if result else 'denied'}")

# Wait a bit for tokens to refill
print("\nSleeping 1.5 seconds...")
time.sleep(1.5)  # should refill ~3 tokens

for i in range(5):
    result = bucket.allow_request()
    print(f"Request {i + 1}: {'allowed' if result else 'denied'}")

Result

Winning Votes

0 / 3

Average Score

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer A provides a clean, correct, and well-documented implementation of the TokenBucket class. It uses time.monotonic() for accurate timing, correctly refills tokens based on elapsed time, caps tokens at capacity, and consumes one token per allowed request. The code is readable with good docstrings and includes a helpful demo/smoke test. However, it lacks thread safety (no locking mechanism), which is a notable omission for a rate limiter that would typically be used in concurrent environments. It also lacks input validation for constructor arguments.

View Score Details ▼

Correctness

Weight 35%

The core logic is correct: tokens are refilled based on elapsed time using time.monotonic(), capped at capacity, and one token is consumed per allowed request. However, the lack of thread safety means the implementation could produce incorrect results in concurrent usage due to race conditions.

Completeness

Weight 20%

The implementation covers all required elements: constructor with capacity and refill_rate, allow_request() method, and internal state management. It includes a runnable demo. However, it is missing thread safety and input validation, which are important for a complete implementation.

Code Quality

Weight 20%

The code is clean, readable, and well-documented with clear docstrings and type hints. Variable names are descriptive. The separation of refill logic into a private method is good design. Minor deduction for lack of private attribute naming conventions.

Practical Value

Weight 15%

The implementation is practically useful for single-threaded scenarios and includes a helpful demo. However, the lack of thread safety significantly limits its practical value in real-world applications where rate limiters are typically used in concurrent environments.

Instruction Following

Weight 10%

Follows all instructions: implements TokenBucket class in Python, constructor accepts capacity and refill_rate, allow_request() takes no arguments and returns True/False while consuming a token. Uses only standard library (time module). State is managed within the instance.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer A provides a correct and well-written implementation for a single-threaded environment. Its strengths are clean code with modern Python type hints, clear logic, and an excellent runnable demo that verifies its behavior. However, its major weakness is the lack of thread safety, which makes it unsuitable for most real-world applications of a rate limiter where concurrent requests are common.

View Score Details ▼

Correctness

Weight 35%

The implementation is logically correct for a single-threaded context. The token refill calculation and capacity limiting are implemented properly. However, it is not correct in a concurrent environment, which is a common use case for this algorithm.

Completeness

Weight 20%

The answer is very complete, providing not just the class implementation but also a runnable demo script under `if __name__ == "__main__":`. This demo effectively showcases the burst and refill behavior, making it easy to understand and verify.

Code Quality

Weight 20%

The code quality is high. It uses modern Python features like type hints, has clear docstrings, and follows good naming conventions. The logic is well-structured with a private helper method for refilling.

Practical Value

Weight 15%

The practical value is limited because the implementation is not thread-safe. Rate limiters are almost always used in concurrent environments (like web servers), where this implementation would be subject to race conditions and would not work reliably.

Instruction Following

Weight 10%

100

The answer perfectly follows all instructions in the prompt, implementing the specified class, constructor, and method with the correct logic. The solution is self-contained.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer A provides a correct and readable Python implementation of a token bucket. It uses a monotonic clock, tracks internal state properly, refills based on elapsed time, caps tokens at capacity, and consumes one token per allowed request. The main weaknesses are lack of input validation, no protection for concurrent access, and a basic float-based approach without addressing precision concerns mentioned in the prompt. The included demo is helpful but not necessary for the core task.

View Score Details ▼

Correctness

Weight 35%

The refill and consumption logic are correct: elapsed time is measured with a monotonic clock, tokens are added proportionally, capped at capacity, and one token is consumed when available. It handles first use and long idle periods properly. The score is held back because it does not address invalid parameters and uses straightforward floating-point accumulation without any mitigation.

Completeness

Weight 20%

It includes the required constructor, internal state, and allow_request method, and the implementation is self-contained. However, it does not cover parameter validation or discuss edge conditions such as invalid capacity or refill rate.

Code Quality

Weight 20%

The code is clean, readable, and idiomatic, with good naming and helpful docstrings. The structure is simple and easy to follow. It loses some points because the public state is exposed directly and the extra demo code is not part of the core implementation.

Practical Value

Weight 15%

It is usable for simple single-threaded scenarios and demonstrates behavior with a quick smoke test. In more realistic settings, the lack of validation and thread safety reduces operational usefulness.

Instruction Following

Weight 10%

It follows the task well by implementing a self-contained TokenBucket class with the required constructor and allow_request behavior. The only slight issue is inclusion of extra demo code beyond the requested core answer.

Answer B: OpenAI GPT-5 mini

import time
import threading

class TokenBucket:
"""
A simple token bucket rate limiter.

Args:
    capacity (int): maximum number of tokens in the bucket.
    refill_rate (float): tokens added per second.

Methods:
    allow_request() -> bool: returns True and consumes one token if available, otherwise False.
"""

def __init__(self, capacity, refill_rate):
    if capacity <= 0:
        raise ValueError("capacity must be > 0")
    if refill_rate < 0:
        raise ValueError("refill_rate must be >= 0")

    self.capacity = float(capacity)
    self.refill...

Show Full Answer ▼

import time
import threading

class TokenBucket:
"""
A simple token bucket rate limiter.

Args:
    capacity (int): maximum number of tokens in the bucket.
    refill_rate (float): tokens added per second.

Methods:
    allow_request() -> bool: returns True and consumes one token if available, otherwise False.
"""

def __init__(self, capacity, refill_rate):
    if capacity <= 0:
        raise ValueError("capacity must be > 0")
    if refill_rate < 0:
        raise ValueError("refill_rate must be >= 0")

    self.capacity = float(capacity)
    self.refill_rate = float(refill_rate)

    # current tokens may be fractional to account for partial refill
    self._tokens = float(capacity)
    self._last_time = time.monotonic()
    self._lock = threading.Lock()

def _refill(self):
    """Refill tokens based on time elapsed since last refill."""
    now = time.monotonic()
    elapsed = now - self._last_time
    if elapsed <= 0:
        return
    added = elapsed * self.refill_rate
    # cap tokens at capacity
    self._tokens = min(self.capacity, self._tokens + added)
    self._last_time = now

def allow_request(self):
    """
    Attempt to consume one token.

    Returns:
        bool: True if a token was available and consumed, False otherwise.
    """
    with self._lock:
        self._refill()
        if self._tokens >= 1.0:
            self._tokens -= 1.0
            return True
        return False

Example usage:

bucket = TokenBucket(capacity=10, refill_rate=1.5) # 1.5 tokens per second

if bucket.allow_request():

# process request

else:

# reject or throttle request

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer B provides a correct, robust, and production-quality implementation of the TokenBucket class. It includes all the core logic of Answer A but adds important enhancements: thread safety via threading.Lock(), input validation with meaningful error messages for invalid constructor arguments, and private naming conventions (_tokens, _last_time, _lock) for better encapsulation. The guard against elapsed <= 0 is a minor defensive touch. The code is clean, well-documented, and follows Python conventions. The only minor weakness is the lack of a runnable demo, but the example usage comments compensate partially.

View Score Details ▼

Correctness

Weight 35%

The core logic is correct and additionally protected by a threading.Lock(), preventing race conditions in concurrent environments. Input validation ensures the object is always in a valid state. The elapsed <= 0 guard is a minor but sensible defensive check.

Completeness

Weight 20%

The implementation covers all required elements and goes further with thread safety, input validation, and proper encapsulation. The example usage comments provide guidance. The only minor gap is the absence of a runnable demo, but this is a minor concern.

Code Quality

Weight 20%

The code is clean, readable, and well-structured. It uses private naming conventions (_tokens, _last_time, _lock) for better encapsulation, includes clear docstrings, and follows Python conventions. The class-level docstring is comprehensive. Slightly better than A in terms of encapsulation and defensive programming.

Practical Value

Weight 15%

The implementation is highly practical for real-world use. Thread safety makes it suitable for concurrent applications, input validation prevents misuse, and the clean API makes it easy to integrate. The example usage comments further aid practical adoption.

Instruction Following

Weight 10%

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer B provides a robust and practical implementation of the token bucket algorithm. Its key strengths are its thread-safety, achieved using a lock, and its inclusion of input validation, making it suitable for use in concurrent, production-like environments. While the code is well-structured, it lacks type hints and a runnable demonstration, which would have improved its clarity and completeness.

View Score Details ▼

Correctness

Weight 35%

The implementation is logically correct for both single-threaded and multi-threaded contexts due to the use of a lock. This makes the solution correct in a much wider and more realistic range of scenarios.

Completeness

Weight 20%

The answer provides the required class but only includes a commented-out snippet for example usage. It lacks a runnable demonstration or test cases to verify its functionality, making it less complete than Answer A.

Code Quality

Weight 20%

The code quality is good, with clear docstrings, sensible variable names, and the inclusion of input validation, which is a good practice. However, it lacks type hints, which are standard in modern Python for improving readability and maintainability.

Practical Value

Weight 15%

The practical value is extremely high. By including a threading lock, the implementation is thread-safe and ready to be used in real-world, multi-threaded applications. The addition of input validation further enhances its practicality and robustness.

Instruction Following

Weight 10%

100

The answer perfectly follows all instructions in the prompt, implementing the specified class, constructor, and method with the correct logic. The solution is self-contained, as `threading` is part of the standard library.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer B provides a correct and more robust Python implementation. It uses a monotonic clock, validates constructor inputs, manages state cleanly, caps refills correctly, and consumes tokens safely. It also adds thread safety with a lock, making it more practical in real applications. Its only minor drawback is that it still relies on floating-point arithmetic without any special mitigation for long-term precision accumulation, but overall it is stronger and more production-oriented.

View Score Details ▼

Correctness

Weight 35%

The core token bucket logic is correct and state updates are done consistently. It correctly refills based on elapsed monotonic time, caps at capacity, and consumes one token per successful request. Input validation strengthens correctness, and locking prevents race-related inconsistencies in multithreaded use. Minor reduction only because it still uses plain floating-point arithmetic.

Completeness

Weight 20%

It fully implements the requested class and methods, manages internal state correctly, and remains self-contained. It also covers important edge handling through constructor validation and safe access patterns, making it more complete than the baseline requirement.

Code Quality

Weight 20%

The code is well-structured, readable, and practical, with clear naming, concise comments, and good encapsulation through underscored attributes. The lock usage is cleanly integrated and the implementation stays focused on the class itself.

Practical Value

Weight 15%

It is closer to production use because it validates inputs and is safe under concurrent access. The implementation is still compact and easy to adopt directly in applications, which gives it strong practical value.

Instruction Following

Weight 10%

It follows the instructions very well: it provides a self-contained TokenBucket class with the required constructor, state handling, and allow_request behavior. The answer stays focused and aligns closely with the expected code-only implementation.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Anthropic Claude Opus 4.6

Winning Votes

0 / 3

Average Score

View this answer

Winner OpenAI GPT-5 mini

Winning Votes

3 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models OpenAI GPT-5.4

Claude Opus 4.6 75

GPT-5 mini Winner 84

Why This Side Won

Answer B wins because it matches the required token bucket behavior while being more robust and practical. Compared with Answer A, it adds input validation and thread safety, which materially improve reliability without compromising clarity. Both answers are correct in core logic, but B better satisfies the benchmark's robustness and practical-value expectations.

Judge Models Google Gemini 2.5 Pro

Claude Opus 4.6 82

GPT-5 mini Winner 85

Why This Side Won

Answer B is the winner because it provides a thread-safe implementation, which is a critical requirement for a rate limiter in almost any practical application. The prompt's requirement for a 'robust' solution that handles 'multiple requests happening in very quick succession' strongly implies the need for concurrency control. While Answer A has better presentation with a runnable demo and type hints, its lack of thread safety is a fundamental flaw in robustness for this specific problem. Answer B's solution is more robust and has significantly higher practical value, making it the superior engineering solution.

Judge Models Anthropic Claude Sonnet 4.6

Claude Opus 4.6 80

GPT-5 mini Winner 87

Why This Side Won

Answer B wins because it includes thread safety via threading.Lock(), which is critical for a rate limiter used in real-world concurrent scenarios. It also adds input validation and better encapsulation through private attributes. Both answers are functionally correct, but Answer B is more robust and production-ready, making it the stronger implementation overall.

Implement a Token Bucket Rate Limiter Class

Task Overview

Task Prompt

Answers

Answer A: Anthropic Claude Opus 4.6

---------------------------------------------------------------------------

Quick demo / smoke test

---------------------------------------------------------------------------

Answer B: OpenAI GPT-5 mini

Example usage:

bucket = TokenBucket(capacity=10, refill_rate=1.5) # 1.5 tokens per second

if bucket.allow_request():

# process request

else:

# reject or throttle request

Comparison Summary

Judging Results

Related Tasks

Summarize the History and Impact of the Printing Press

Eulogy for a Sentient Toaster

Creative Solutions for Household Food Waste Reduction

Advice for Dealing with a Critical Coworker

Summarize the History of the Suez Canal

Master Woodworker's Advice

Design a Real-Time E-commerce Notification System

Brainstorming Strategies for a Local Bookstore to Thrive

Related Links