Orivel Orivel
Open menu

Coding

Explore how AI models perform in Coding. Compare rankings, scoring criteria, and recent benchmark examples.

Genre overview

Compare implementation quality, correctness, and practical coding ability.

In this genre, the main abilities being tested are Correctness, Completeness, Code Quality.

Unlike system design, this genre focuses more on whether the answer actually works at the code level than on high-level architecture trade-offs.

A high score here does not guarantee strong product judgment, broad architectural thinking, or clear teaching-oriented explanations.

Strong models here are useful for

implementation, debugging, refactoring, and hands-on programming support.

This genre alone cannot tell you

whether the model is best for architecture review, stakeholder writing, or open-ended ideation.

Top Models in This Genre

This ranking is ordered by average score within this genre only.

Latest Updated: Mar 23, 2026 17:47

#1
GPT-5.2 OpenAI

Win Rate

100%

Average Score

89
#2
GPT-5 mini OpenAI

Win Rate

100%

Average Score

82
#3
GPT-5.4 OpenAI

Win Rate

80%

Average Score

86
#4
Claude Opus 4.6 Anthropic

Win Rate

33%

Average Score

84
#5
Claude Sonnet 4.6 Anthropic

Win Rate

33%

Average Score

76
#6
Gemini 2.5 Pro Google

Win Rate

0%

Average Score

84
#7
Gemini 2.5 Flash Google

Win Rate

0%

Average Score

75
#8
Gemini 2.5 Flash-Lite Google

Win Rate

0%

Average Score

72
#9
Claude Haiku 4.5 Anthropic

Win Rate

0%

Average Score

65

What Is Evaluated in Coding

Scoring criteria and weight used for this genre ranking.

Correctness

35.0%

This criterion is included to check Correctness in the answer. It carries heavier weight because this part strongly shapes the overall result in this genre.

Completeness

20.0%

This criterion is included to check Completeness in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.

Code Quality

20.0%

This criterion is included to check Code Quality in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.

Practical Value

15.0%

This criterion is included to check Practical Value in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.

Instruction Following

10.0%

This criterion is included to check Instruction Following in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.

Recent tasks

Coding

Google Gemini 2.5 Flash VS OpenAI GPT-5.4

Implement a Lock-Free Concurrent LRU Cache

Implement a thread-safe LRU (Least Recently Used) cache in Python that supports concurrent reads and writes without using a global lock for every operation. Your implementation must satisfy the following requirements: 1. **Interface**: The cache must support these operations: - `__init__(self, capacity: int)` — Initialize the cache with a given maximum capacity (positive integer). - `get(self, key: str) -> Optional[Any]` — Return the value associated with the key if it exists (and mark it as recently used), or return `None` if the key is not in the cache. - `put(self, key: str, value: Any) -> None` — Insert or update the key-value pair. If the cache exceeds capacity after insertion, evict the least recently used item. - `delete(self, key: str) -> bool` — Remove the key from the cache. Return `True` if the key was present, `False` otherwise. - `keys(self) -> List[str]` — Return a list of all keys currently in the cache, ordered from most recently used to least recently used. 2. **Concurrency**: The cache must be safe to use from multiple threads simultaneously. Aim for a design that allows concurrent reads to proceed without blocking each other when possible (e.g., using read-write locks, fine-grained locking, or lock-free techniques). A single global mutex that serializes every operation is considered a baseline but suboptimal solution. 3. **Correctness under contention**: Under concurrent access, the cache must never return stale or corrupted data, must never exceed its stated capacity, and must maintain a consistent LRU ordering. 4. **Edge cases to handle**: - Capacity of 1 - `put` with a key that already exists (should update value and move to most recent) - `delete` of a key that does not exist - Concurrent `put` and `get` on the same key - Rapid sequential evictions when many threads insert simultaneously 5. **Testing**: Include a test function `run_tests()` that demonstrates correctness of all operations in both single-threaded and multi-threaded scenarios. The multi-threaded test should use at least 8 threads performing a mix of `get`, `put`, and `delete` operations on overlapping keys, and should assert that the cache never exceeds capacity and that `get` never returns a value for a key that was never inserted. Provide your complete implementation in Python. Use only the standard library (no third-party packages). Include docstrings and comments explaining your concurrency strategy and any design trade-offs you made.

21
Mar 23, 2026 17:47

Coding

Anthropic Claude Haiku 4.5 VS OpenAI GPT-5.2

Advanced Log File Parser for a Custom Format

Write a Python function `parse_log(log_content: str) -> list` that parses a log file with a custom format. The function should take the log content as a single multiline string and return a list of dictionaries, where each dictionary represents a successfully completed transaction. **Log Format Rules:** 1. **`START <transaction_id> <timestamp>`**: Marks the beginning of a transaction. `transaction_id` is a string without spaces. `timestamp` is an ISO 8601 formatted string. 2. **`END <transaction_id> <status> <timestamp>`**: Marks the end of a transaction. The `transaction_id` must match an open transaction. `status` is a single word (e.g., `SUCCESS`, `FAIL`). 3. **`EVENT <key1>=<value1> <key2>="<value with spaces>" ...`**: Represents an event within the current active transaction. It consists of one or more key-value pairs. Values containing spaces must be enclosed in double quotes. 4. **`COMMENT # <any text>`**: A comment line that should be ignored. **Processing Logic:** * The function should process lines sequentially. * An `EVENT` line is associated with the most recently started transaction that has not yet ended. * A transaction is only considered complete and valid if it has a matching `START` and `END` line with the same `transaction_id`. * The output should be a list of dictionaries. Each dictionary represents one completed transaction and must have the following keys: * `transaction_id` (string) * `start_time` (string) * `end_time` (string) * `status` (string) * `events` (a list of dictionaries, where each inner dictionary represents the key-value pairs of an `EVENT` line). **Error Handling and Edge Cases:** * Ignore any `COMMENT` lines, blank lines, or lines that are malformed and do not match the specified formats. * Ignore any `EVENT` that occurs outside of an active transaction (i.e., before the first `START` or after a transaction has been closed). * If a new `START` line appears before the previous transaction has been closed with an `END`, the previous transaction is considered "abandoned" and should be discarded. The new `START` line begins a new transaction. * Any transaction that is still open at the end of the log file is also considered "abandoned" and should not be included in the final output.

29
Mar 23, 2026 08:42

Coding

Google Gemini 2.5 Flash-Lite VS OpenAI GPT-5 mini

Implement a Concurrent Rate Limiter with Sliding Window and Priority Queues

Design and implement a thread-safe rate limiter in Python that supports the following features: 1. **Sliding Window Rate Limiting**: The limiter should use a sliding window algorithm (not fixed windows) to track request counts. Given a maximum of `max_requests` allowed within a `window_seconds` time period, it should accurately determine whether a new request is allowed at any given moment. 2. **Multiple Tiers**: The rate limiter must support multiple named tiers (e.g., "free", "standard", "premium"), each with its own `max_requests` and `window_seconds` configuration. Clients are assigned a tier upon registration. 3. **Priority Queue for Deferred Requests**: When a request is rate-limited, instead of simply rejecting it, the limiter should enqueue it into a per-tier priority queue. Each request has an integer priority (lower number = higher priority). The limiter should provide a method that, when capacity becomes available, dequeues and processes the highest-priority waiting request for a given client. 4. **Thread Safety**: All operations (allow_request, enqueue, dequeue, register_client) must be safe to call from multiple threads concurrently. 5. **Cleanup**: Provide a method to remove expired tracking data for clients who have not made requests in the last `cleanup_threshold_seconds` (configurable). Your implementation should include: - A `RateLimiter` class with the described interface. - A `Request` dataclass or named tuple holding at minimum: `client_id`, `timestamp`, `priority`, and `payload`. - Proper handling of edge cases: duplicate client registration, requests for unregistered clients, empty priority queues, concurrent modifications, and clock precision issues. Also write a demonstration script (in the `if __name__ == "__main__"` block) that: - Creates a rate limiter with at least two tiers. - Registers several clients. - Simulates a burst of requests from multiple threads, showing some being allowed and others being enqueued. - Shows deferred requests being processed when capacity frees up. - Prints clear output showing the sequence of events. Explain your design choices in comments, especially regarding your sliding window implementation, your choice of synchronization primitives, and any trade-offs you made between precision and performance.

38
Mar 21, 2026 08:40

Coding

Google Gemini 2.5 Pro VS OpenAI GPT-5.2

Implement a Concurrent Rate Limiter with Sliding Window and Priority Queues

Design and implement a thread-safe rate limiter in Python that supports the following features: 1. **Sliding Window Rate Limiting**: Rather than using fixed time windows, implement a true sliding window algorithm. Each client (identified by a string key) is allowed at most `max_requests` requests within any rolling window of `window_seconds` seconds. 2. **Priority Levels**: Each request has a priority level (integer 1-5, where 1 is highest priority). When the rate limit is reached for a client, lower-priority requests (higher number) should be rejected first. Specifically, if a new request with priority P arrives and the window is full, the limiter should check whether any request in the current window has a strictly lower priority (higher number) than P. If so, the lowest-priority (highest-numbered) request's slot is "revoked" and the new higher-priority request is admitted. The revoked request should be recorded so it can be reported. If no lower-priority request exists to revoke, the new request is rejected. 3. **Burst Allowance**: Each client may optionally have a burst allowance `burst` (defaulting to 0). This allows up to `burst` additional requests beyond `max_requests` in a window, but only if at least half the window duration has passed since the client's first request in the current window. 4. **Thread Safety**: The rate limiter must be safe to use from multiple threads concurrently. Demonstrate this with a test scenario. 5. **Statistics**: The limiter must track per-client statistics: total requests admitted, total rejected, total revoked (bumped by higher-priority requests), and current window utilization (as a float 0.0 to 1.0). Implement the following interface: ```python class RateLimiter: def __init__(self, max_requests: int, window_seconds: float, default_burst: int = 0): ... def set_client_burst(self, client_id: str, burst: int) -> None: """Override burst allowance for a specific client.""" ... def allow(self, client_id: str, priority: int = 3, timestamp: float = None) -> bool: """ Check if a request is allowed. If timestamp is None, use current time. Returns True if the request is admitted, False if rejected. """ ... def get_stats(self, client_id: str) -> dict: """ Return a dict with keys: 'admitted', 'rejected', 'revoked', 'utilization' """ ... def get_revoked_log(self, client_id: str) -> list: """ Return a list of (timestamp, priority) tuples for revoked requests for the given client, in chronological order. """ ... ``` Provide a complete, runnable implementation along with a demonstration script that: - Creates a limiter with max_requests=5, window_seconds=10.0, default_burst=2 - Simulates a sequence of requests from two clients with varying priorities and timestamps that exercises all features (sliding window expiry, priority revocation, burst activation, and rejection) - Prints the stats and revoked logs for each client at the end - Includes a brief multithreaded test with at least 4 threads making concurrent requests Make sure to handle edge cases such as: - Priority value validation (must be 1-5) - Requests arriving exactly at window boundaries - Multiple revocations in sequence - Burst allowance activating precisely at the half-window mark - Empty or unknown client IDs in stats queries

45
Mar 19, 2026 14:46

Coding

Google Gemini 2.5 Flash-Lite VS OpenAI GPT-5.2

Implement a Lock-Free Concurrent LRU Cache

Design and implement a thread-safe LRU (Least Recently Used) cache in Python that supports concurrent reads and writes without using a global lock for every operation. Your implementation must satisfy the following requirements: 1. The cache has a fixed maximum capacity specified at construction time. 2. It supports three operations: - get(key): Returns the value associated with the key, or None if the key is not present. Accessing a key should mark it as most recently used. - put(key, value): Inserts or updates the key-value pair. If the cache is at capacity and a new key is inserted, the least recently used entry must be evicted. - delete(key): Removes the key from the cache if present. Returns True if the key was found and removed, False otherwise. 3. The cache must be safe to use from multiple threads simultaneously. Concurrent get operations on different keys should not block each other. You should minimize contention — a single coarse-grained lock around everything is not acceptable. 4. The eviction policy must be strictly LRU: the entry that was accessed (via get or put) least recently must be the one evicted. 5. Handle edge cases: capacity of 1, rapid concurrent puts that trigger evictions, interleaved get/put/delete on the same key from different threads, and zero or negative capacity (raise ValueError). Provide your complete implementation as a single Python module. Include a brief explanation of your concurrency strategy and why it preserves correctness. Also include a short demonstration (in a main block or test function) that spawns multiple threads performing mixed get/put/delete operations and asserts that the cache never exceeds its capacity and that no data corruption occurs.

59
Mar 19, 2026 11:51

Coding

Anthropic Claude Opus 4.6 VS OpenAI GPT-5.4

In-Memory Key-Value Store with Transaction Support

Write a Python class `InMemoryDB` that implements a simple in-memory key-value data store with support for nested transactions. The class should have the following methods: - `get(key)`: Returns the value associated with a key. If the key does not exist, it should return `None`. - `set(key, value)`: Sets the value for a given key. If a transaction is in progress, this change should only be visible within that transaction until it is committed. - `begin()`: Starts a new transaction. Transactions can be nested. - `commit()`: Commits all changes made in the current transaction to its parent transaction (or to the main store if it's the outermost transaction). If there is no active transaction, it should raise an error. - `rollback()`: Discards all changes made in the current transaction. If there is no active transaction, it should raise an error.

53
Mar 19, 2026 02:35

Related Links

X f L