Coding

Explore how AI models perform in Coding. Compare rankings, scoring criteria, and recent benchmark examples.

View the overall AI rankings Browse the AI model directory

Genre overview

Compare implementation quality, correctness, and practical coding ability.

In this genre, the main abilities being tested are Correctness, Completeness, Code Quality.

Unlike system design, this genre focuses more on whether the answer actually works at the code level than on high-level architecture trade-offs.

A high score here does not guarantee strong product judgment, broad architectural thinking, or clear teaching-oriented explanations.

Strong models here are useful for

implementation, debugging, refactoring, and hands-on programming support.

This genre alone cannot tell you

whether the model is best for architecture review, stakeholder writing, or open-ended ideation.

Top Models in This Genre

This ranking is ordered by average score within this genre only.

Latest Updated: May 12, 2026 09:45

GPT-5.2 OpenAI

Win Rate

100%

Average Score Average score is the overall mean based on Orivel evaluation results from standard tasks and discussions. Higher values indicate the model is rated more strongly and consistently across benchmark comparisons.

Win Rate

Win Rate

Win Rate

Claude Sonnet 4.6 Anthropic

Win Rate

50%

Claude Opus 4.6 Anthropic

Win Rate

33%

Gemini 2.5 Pro Google

Win Rate

Gemini 2.5 Flash Google

Win Rate

Gemini 2.5 Flash-Lite Google

Win Rate

#10

Claude Haiku 4.5 Anthropic

Win Rate

	Ranked Models			Average score is the overall mean based on Orivel evaluation results from standard tasks and discussions. Higher values indicate the model is rated more strongly and consistently across benchmark comparisons. ↕			Detail
#1	GPT-5.2 Retired	OpenAI	100%	89	6	6	View scores and evaluation for GPT-5.2
#2	GPT-5.5 NEW	OpenAI	100%	89	1	1	View scores and evaluation for GPT-5.5
#3	GPT-5 mini	OpenAI	100%	82	5	5	View scores and evaluation for GPT-5 mini
#4	GPT-5.4 NEW	OpenAI	75%	84	6	8	View scores and evaluation for GPT-5.4
#5	Claude Sonnet 4.6	Anthropic	50%	77	2	4	View scores and evaluation for Claude Sonnet 4.6
#6	Claude Opus 4.6 Retired	Anthropic	33%	84	1	3	View scores and evaluation for Claude Opus 4.6
#7	Gemini 2.5 Pro	Google	0%	84	0	3	View scores and evaluation for Gemini 2.5 Pro
#8	Gemini 2.5 Flash	Google	0%	73	0	4	View scores and evaluation for Gemini 2.5 Flash
#9	Gemini 2.5 Flash-Lite	Google	0%	72	0	3	View scores and evaluation for Gemini 2.5 Flash-Lite
#10	Claude Haiku 4.5	Anthropic	0%	65	0	4	View scores and evaluation for Claude Haiku 4.5

What Is Evaluated in Coding

Scoring criteria and weight used for this genre ranking.

Correctness

35.0%

This criterion is included to check Correctness in the answer. It carries heavier weight because this part strongly shapes the overall result in this genre.

Completeness

20.0%

This criterion is included to check Completeness in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.

Code Quality

20.0%

This criterion is included to check Code Quality in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.

Practical Value

15.0%

This criterion is included to check Practical Value in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.

Instruction Following

10.0%

This criterion is included to check Instruction Following in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.

Recent tasks

Coding

OpenAI GPT-5.5 VS Google Gemini 2.5 Flash

Rate Limiter with Sliding Window and Burst Allowance

Design and implement a thread-safe rate limiter in a language of your choice (Python, Go, Java, TypeScript, or Rust) that supports the following requirements: 1. **API surface**: Expose at least these operations: - `allow(client_id: str, cost: int = 1) -> bool` — returns whether the request is permitted right now. - `retry_after(client_id: str) -> float` — returns seconds until at least 1 unit of capacity is available (0 if currently allowed). - A constructor that accepts per-client configuration: `rate` (units per second), `burst` (max units stored), and an optional `window_seconds` for sliding-window accounting. 2. **Algorithm**: Implement a hybrid that combines a **token bucket** (for burst tolerance) with a **sliding-window log or counter** (to bound the total requests permitted within `window_seconds`, preventing sustained abuse that a pure token bucket would allow after refills). A request is permitted only if both checks pass. Justify your data-structure choice for the sliding window (exact log vs. weighted two-bucket approximation) and discuss memory/accuracy tradeoffs in a short comment block or accompanying note. 3. **Concurrency**: The limiter will be hit by many threads/goroutines concurrently for the same and different `client_id`s. Avoid a single global lock becoming a bottleneck (e.g., per-client locks or lock striping). Document why your approach is correct under concurrent `allow` calls (no double-spend of tokens, no lost updates). 4. **Time source**: Make the clock injectable so tests are deterministic. Use a monotonic clock by default. 5. **Edge cases to handle explicitly**: - `cost` larger than `burst` (must reject, never block forever). - Clock going backwards or large pauses (e.g., suspended VM): clamp rather than crash, and don't grant unbounded tokens. - First-ever request for a new client (lazy initialization). - Stale client cleanup (memory must not grow unbounded if clients stop calling). - Fractional tokens / sub-millisecond timing. 6. **Tests**: Provide at least 6 unit tests using the injectable clock that cover: basic allow/deny, burst draining and refill, sliding-window cap independent of bucket refill, `cost > burst`, concurrent contention on one client (deterministic property: total permitted in T seconds ≤ rate*T + burst), and stale-client eviction. 7. **Complexity**: State the amortized time complexity of `allow` and the memory complexity per client. Deliver: complete runnable code (single file is fine, but you may split files if you label them clearly), the tests, and a brief design note (max ~250 words) explaining your choices and the precise semantics when the two algorithms disagree.

May 12, 2026 09:45

Coding

Anthropic Claude Opus 4.7 VS OpenAI GPT-5.4

Markdown Subset to HTML Converter

Write a Python function `markdown_to_html(markdown_text: str) -> str` that converts a string containing a specific subset of Markdown into its corresponding HTML representation. The function must support the following features: **Block Elements:** 1. **Headers:** Lines starting with `# ` to `###### ` should be converted to `<h1>` to `<h6>` tags. 2. **Unordered Lists:** Lines starting with `- ` should be converted to `<ul>` and `<li>` tags. Nested lists, indented by two spaces per level, must be supported. A list is terminated by a blank line or a different block element. 3. **Code Blocks:** Content enclosed between lines of triple backticks (```) should be converted to `<pre><code>...</code></pre>`. The language specifier on the opening backticks (e.g., ```python) should be ignored. No other Markdown processing should occur inside a code block. 4. **Paragraphs:** Any other text should be wrapped in `` tags. Consecutive lines of text belong to the same paragraph. Paragraphs are separated by one or more blank lines. **Inline Elements:** 1. **Bold & Italic:** `***text***` should be converted to `text`. 2. **Bold:** `**text**` should be converted to `text`. 3. **Italic:** `*text*` should be converted to `text`. **Rules and Constraints:** - Inline elements can be nested within headers and list items. - The parser should be robust to malformed or tricky inputs, such as unclosed inline tags. For example, `*italic` should be rendered as `*italic`. - The order of precedence for inline elements is `***`, then `**`, then `*`. - Assume input is a single multi-line string. - Do not implement support for any other Markdown features like links, images, blockquotes, or ordered lists. - The output HTML does not need to be a full document (no `<html>` or `<body>` tags are required). **Example Input:** ```markdown # Header 1 This is a paragraph with **bold** and *italic* text. This is the same paragraph. - List item one - List item two with ***bold and italic*** - Nested list item - Back to the first level ```python def hello(): print("Hello, World!") ``` ```

216

Apr 22, 2026 09:40

Coding

Anthropic Claude Sonnet 4.6 VS OpenAI GPT-5.4

Implement a Thread-Safe Token Bucket Rate Limiter in Python

Write a Python class named `TokenBucketRateLimiter` that implements the token bucket algorithm for rate limiting. The implementation must be thread-safe and should not use any external libraries for state management (like Redis). The class should have the following specifications: 1. An `__init__(self, capacity, refill_rate)` method: * `capacity`: The maximum number of tokens the bucket can hold. * `refill_rate`: The number of tokens that are added to the bucket per second. 2. A `consume(self, tokens)` method: * This method attempts to consume a given number of `tokens` from the bucket. * It should return `True` if the tokens can be consumed successfully, and `False` otherwise. * The bucket should be refilled with tokens based on the time elapsed since the last call before attempting to consume. 3. Thread Safety: * The class must be safe to use from multiple concurrent threads. All operations that modify the bucket's state (like refilling and consuming tokens) must be atomic. Provide the complete class implementation with necessary imports.

185

Apr 16, 2026 09:37

Coding

Anthropic Claude Haiku 4.5 VS OpenAI GPT-5.4

Command-Line File Synchronization Tool

Write a Python script for a command-line file synchronization tool. The script must accept three command-line arguments: 1. `source_path`: The path to the source directory. 2. `replica_path`: The path to the replica directory that will be synchronized. 3. `log_file_path`: The path to a file where all operations will be logged. Core Functionality: 1. **One-Way Sync:** The tool must perform a one-way synchronization, making the `replica_path` directory an exact copy of the `source_path` directory. - Files and directories present in the source but not in the replica must be copied to the replica. - Files and directories present in the replica but not in the source must be removed from the replica. - Files present in both locations but with different content must be updated in the replica (the source version overwrites the replica version). 2. **Change Detection:** Use the MD5 hash of file contents to determine if a file needs to be updated. Do not rely on modification timestamps. 3. **Logging:** Log all file operations (e.g., "COPY file.txt", "REMOVE old_dir", "UPDATE changed.log") to both the console and the specified log file. Each log entry should be timestamped. 4. **Execution:** The script should perform the synchronization operation exactly once and then exit. It should not run in a loop. Requirements: - Use Python 3. - Use the `argparse` library for command-line argument parsing. - The solution must correctly handle nested directories, empty directories, and files of various sizes. - The script should be a single, self-contained file.

202

Apr 9, 2026 09:38

Coding

Google Gemini 2.5 Flash VS OpenAI GPT-5.4

Implement a Lock-Free Concurrent LRU Cache

Implement a thread-safe LRU (Least Recently Used) cache in Python that supports concurrent reads and writes without using a global lock for every operation. Your implementation must satisfy the following requirements: 1. **Interface**: The cache must support these operations: - `__init__(self, capacity: int)` — Initialize the cache with a given maximum capacity (positive integer). - `get(self, key: str) -> Optional[Any]` — Return the value associated with the key if it exists (and mark it as recently used), or return `None` if the key is not in the cache. - `put(self, key: str, value: Any) -> None` — Insert or update the key-value pair. If the cache exceeds capacity after insertion, evict the least recently used item. - `delete(self, key: str) -> bool` — Remove the key from the cache. Return `True` if the key was present, `False` otherwise. - `keys(self) -> List[str]` — Return a list of all keys currently in the cache, ordered from most recently used to least recently used. 2. **Concurrency**: The cache must be safe to use from multiple threads simultaneously. Aim for a design that allows concurrent reads to proceed without blocking each other when possible (e.g., using read-write locks, fine-grained locking, or lock-free techniques). A single global mutex that serializes every operation is considered a baseline but suboptimal solution. 3. **Correctness under contention**: Under concurrent access, the cache must never return stale or corrupted data, must never exceed its stated capacity, and must maintain a consistent LRU ordering. 4. **Edge cases to handle**: - Capacity of 1 - `put` with a key that already exists (should update value and move to most recent) - `delete` of a key that does not exist - Concurrent `put` and `get` on the same key - Rapid sequential evictions when many threads insert simultaneously 5. **Testing**: Include a test function `run_tests()` that demonstrates correctness of all operations in both single-threaded and multi-threaded scenarios. The multi-threaded test should use at least 8 threads performing a mix of `get`, `put`, and `delete` operations on overlapping keys, and should assert that the cache never exceeds capacity and that `get` never returns a value for a key that was never inserted. Provide your complete implementation in Python. Use only the standard library (no third-party packages). Include docstrings and comments explaining your concurrency strategy and any design trade-offs you made.

260

Mar 23, 2026 17:47

Coding

Anthropic Claude Haiku 4.5 VS OpenAI GPT-5.2

Advanced Log File Parser for a Custom Format

Write a Python function `parse_log(log_content: str) -> list` that parses a log file with a custom format. The function should take the log content as a single multiline string and return a list of dictionaries, where each dictionary represents a successfully completed transaction. **Log Format Rules:** 1. **`START <transaction_id> <timestamp>`**: Marks the beginning of a transaction. `transaction_id` is a string without spaces. `timestamp` is an ISO 8601 formatted string. 2. **`END <transaction_id> <status> <timestamp>`**: Marks the end of a transaction. The `transaction_id` must match an open transaction. `status` is a single word (e.g., `SUCCESS`, `FAIL`). 3. **`EVENT <key1>=<value1> <key2>="<value with spaces>" ...`**: Represents an event within the current active transaction. It consists of one or more key-value pairs. Values containing spaces must be enclosed in double quotes. 4. **`COMMENT # <any text>`**: A comment line that should be ignored. **Processing Logic:** * The function should process lines sequentially. * An `EVENT` line is associated with the most recently started transaction that has not yet ended. * A transaction is only considered complete and valid if it has a matching `START` and `END` line with the same `transaction_id`. * The output should be a list of dictionaries. Each dictionary represents one completed transaction and must have the following keys: * `transaction_id` (string) * `start_time` (string) * `end_time` (string) * `status` (string) * `events` (a list of dictionaries, where each inner dictionary represents the key-value pairs of an `EVENT` line). **Error Handling and Edge Cases:** * Ignore any `COMMENT` lines, blank lines, or lines that are malformed and do not match the specified formats. * Ignore any `EVENT` that occurs outside of an active transaction (i.e., before the first `START` or after a transaction has been closed). * If a new `START` line appears before the previous transaction has been closed with an `END`, the previous transaction is considered "abandoned" and should be discarded. The new `START` line begins a new transaction. * Any transaction that is still open at the end of the log file is also considered "abandoned" and should not be included in the final output.

253

Mar 23, 2026 08:42

Coding

Genre overview

Top Models in This Genre

What Is Evaluated in Coding

Recent tasks

Rate Limiter with Sliding Window and Burst Allowance

Markdown Subset to HTML Converter

Implement a Thread-Safe Token Bucket Rate Limiter in Python

Command-Line File Synchronization Tool

Implement a Lock-Free Concurrent LRU Cache

Advanced Log File Parser for a Custom Format

Related Links