Implement a Lock-Free Concurrent LRU Cache

Compare model answers for this Coding benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

Coding

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

Anthropic Claude Opus 4.6

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A Google Gemini 2.5 Flash-Lite

Answer B OpenAI GPT-5.2

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.4 Anthropic Claude Sonnet 4.6 Google Gemini 2.5 Pro

Task Prompt

Show more ▼

Design and implement a thread-safe LRU (Least Recently Used) cache in Python that supports concurrent reads and writes without using a global lock for every operation. Your implementation must satisfy the following requirements: 1. The cache has a fixed maximum capacity specified at construction time. 2. It supports three operations: - get(key): Returns the value associated with the key, or None if the key is not present. Accessing a key should mark it as most recently used. - put(key, value): Inserts or updates the key-value pair. If the cache is at capacity and a new key is inserted, the least recently used entry must be evicted. - delete(key): Removes the key from the cache if present. Returns True if the key was found and removed, False otherwise. 3. The cache must be safe to use from multiple threads simultaneously. Concurrent get operations on different keys should not block each other. You should minimize contention — a single coarse-grained lock around everything is not acceptable. 4. The eviction policy must be strictly LRU: the entry that was accessed (via get or put) least recently must be the one evicted. 5. Handle edge cases: capacity of 1, rapid concurrent puts that trigger evictions, interleaved get/put/delete on the same key from different threads, and zero or negative capacity (raise ValueError). Provide your complete implementation as a single Python module. Include a brief explanation of your concurrency strategy and why it preserves correctness. Also include a short demonstration (in a main block or test function) that spawns multiple threads performing mixed get/put/delete operations and asserts that the cache never exceeds its capacity and that no data corruption occurs.

Task Context

This task tests the ability to combine data structure design (doubly linked list plus hash map for O(1) LRU operations) with fine-grained concurrency control. A strong answer will use techniques such as striped or segmented locking, read-write locks, or lock-free data structures to reduce contention beyond a naive single-lock approach, while still guaranteeing strict LRU semantics and thread safety.

Judging Policy

Show more ▼

A high-quality answer should be evaluated on the following dimensions. First, correctness: the LRU eviction logic must be strictly correct, get must update recency, put must evict the true least-recently-used item when at capacity, and delete must work properly. Second, concurrency design: the solution must go beyond a single global lock and use a meaningful strategy to reduce contention, such as segmented locking, read-write locks, or per-node locking, while still preserving thread safety. The explanation of the concurrency strategy should be clear and accurate. Third, edge case handling: capacity of 1, zero or negative capacity raising ValueError, and concurrent operations on the same key must all be handled. Fourth, the demonstration or test code should actually spawn multiple threads, perform mixed operations, and include meaningful assertions about capacity limits and data integrity. Fifth, code quality: the implementation should be clean, well-structured, and use appropriate Python idioms. Partial credit is appropriate for solutions that use a single global lock but are otherwise correct, though they should be rated lower than solutions with genuine fine-grained concurrency. Solutions with race conditions, incorrect eviction order, or missing edge case handling should be rated significantly lower.

Answers

Answer A: Google Gemini 2.5 Flash-Lite

import threading
import time
from collections import OrderedDict

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity: int):
        if capacity <= 0:
            raise ValueError("Capacity must be a positive integer.")
        self.capacity = capacity
        self.cache = {}
        self.head = Node(None, None)  # Dummy head
        self.tail = Node(None, None)  # Dummy tail
        self.head.next = self.tail
        self.tail.prev = self.head
        self.loc...

Show Full Answer ▼

import threading
import time
from collections import OrderedDict

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity: int):
        if capacity <= 0:
            raise ValueError("Capacity must be a positive integer.")
        self.capacity = capacity
        self.cache = {}
        self.head = Node(None, None)  # Dummy head
        self.tail = Node(None, None)  # Dummy tail
        self.head.next = self.tail
        self.tail.prev = self.head
        self.lock = threading.RLock() # Using RLock for simplicity in this example, but a more fine-grained approach is possible.

    def _remove_node(self, node):
        prev_node = node.prev
        next_node = node.next
        prev_node.next = next_node
        next_node.prev = prev_node

    def _add_to_front(self, node):
        node.next = self.head.next
        node.prev = self.head
        self.head.next.prev = node
        self.head.next = node

    def get(self, key):
        with self.lock:
            if key in self.cache:
                node = self.cache[key]
                self._remove_node(node)
                self._add_to_front(node)
                return node.value
            return None

    def put(self, key, value):
        with self.lock:
            if key in self.cache:
                node = self.cache[key]
                node.value = value
                self._remove_node(node)
                self._add_to_front(node)
            else:
                if len(self.cache) >= self.capacity:
                    # Evict LRU item
                    lru_node = self.tail.prev
                    self._remove_node(lru_node)
                    del self.cache[lru_node.key]
                new_node = Node(key, value)
                self.cache[key] = new_node
                self._add_to_front(new_node)

    def delete(self, key):
        with self.lock:
            if key in self.cache:
                node = self.cache[key]
                self._remove_node(node)
                del self.cache[key]
                return True
            return False

    def __len__(self):
        with self.lock:
            return len(self.cache)

    def __str__(self):
        with self.lock:
            items = []
            current = self.head.next
            while current != self.tail:
                items.append(f"({current.key}: {current.value})")
                current = current.next
            return " -> ".join(items)

# --- Concurrency Strategy Explanation ---
# This implementation uses a single threading.RLock for all operations (get, put, delete). 
# While this guarantees thread safety, it does not minimize contention as requested by the prompt.
# A more advanced, fine-grained approach would involve segmenting the cache or using
# lock-free data structures. For example, one could use multiple locks, each protecting a subset
# of the cache entries (striped locking), or use read-write locks to allow multiple readers
# to access different parts of the cache concurrently.
# The current RLock ensures that only one thread can modify the cache's internal state (the dictionary and the doubly linked list) at a time.
# This prevents race conditions where multiple threads might try to update the same node, add/remove nodes, or evict entries simultaneously, leading to data corruption or incorrect LRU ordering.
# The use of RLock (Reentrant Lock) allows a thread to acquire the lock multiple times, which can be useful in complex scenarios but doesn't fundamentally change the contention characteristics compared to a standard Lock in this specific usage.

# --- Demonstration ---

def worker(cache, operations, thread_id):
    for op, key, value in operations:
        if op == 'get':
            cache.get(key)
        elif op == 'put':
            cache.put(key, value)
        elif op == 'delete':
            cache.delete(key)
        # Simulate some work and allow other threads to run
        time.sleep(0.001)

def run_demonstration():
    print("Running LRU Cache Demonstration...")
    capacity = 5
    cache = LRUCache(capacity)
    num_threads = 10
    ops_per_thread = 50

    all_operations = []
    for i in range(num_threads):
        thread_ops = []
        for j in range(ops_per_thread):
            op_type = ['get', 'put', 'delete'][j % 3]
            key = (i * ops_per_thread + j) % (capacity * 2) # Keys that might or might not be in cache
            value = f"val_{i}_{j}"
            thread_ops.append((op_type, key, value))
        all_operations.append(thread_ops)

    threads = []
    for i in range(num_threads):
        thread = threading.Thread(target=worker, args=(cache, all_operations[i], i))
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

    print(f"Demonstration finished. Final cache size: {len(cache)}")
    # Assert that the cache never exceeds its capacity
    assert len(cache) <= capacity, f"Cache capacity exceeded! Current size: {len(cache)}, Capacity: {capacity}"
    print("Assertion passed: Cache capacity is maintained.")
    # Basic check for data integrity (not exhaustive)
    print("Performing basic integrity check...")
    # This is a very basic check. A more robust test would involve tracking expected states.
    # For now, we just ensure no obvious corruption like negative size or exceptions during access.
    try:
        # Try accessing a few keys that were likely put
        for i in range(min(5, capacity)):
            key_to_check = (i * ops_per_thread) % (capacity * 2)
            cache.get(key_to_check)
        print("Basic integrity check passed.")
    except Exception as e:
        print(f"Integrity check failed with exception: {e}")
        assert False, "Cache integrity check failed."

if __name__ == "__main__":
    run_demonstration()

Result

Winning Votes

0 / 3

Average Score

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer A implements a conventional LRU cache with a dict plus doubly linked list and its basic single-threaded LRU logic is sound. It correctly handles positive-capacity validation, get/put/delete, and includes a multithreaded demo. However, it explicitly uses one global RLock around every operation, which violates the prompt’s requirement to avoid a coarse-grained lock and does not allow concurrent gets on different keys. The demo assertions are weak and do not meaningfully validate strict concurrency correctness or data integrity beyond basic capacity checks.

View Score Details ▼

Correctness

Weight 35%

The dict plus doubly linked list implementation is correct for standard LRU behavior, and get updates recency while put evicts tail.prev. Delete behavior is also correct. However, correctness for the intended concurrent setting is limited by full serialization under one lock, and the design does not tackle the harder race conditions the task emphasizes.

Completeness

Weight 20%

It includes the required API, validation for nonpositive capacity, explanation, and a multithreaded demonstration. But it does not satisfy the request for minimizing contention or concurrent nonblocking reads on different keys, and the demo does not deeply test correctness under interleavings.

Code Quality

Weight 20%

The code is readable, straightforward, and organized clearly. The helper methods for list operations are sensible. However, there is an unnecessary OrderedDict import, the explanation admits the design is not what was requested, and the testing logic is fairly superficial.

Practical Value

Weight 15%

As a simple thread-safe LRU cache, it is usable in practice, but the single global lock creates high contention and defeats the main practical requirement for scalable concurrent access. The demo gives limited confidence for real multithreaded workloads.

Instruction Following

Weight 10%

It follows the API and module-format requirements, but it clearly does not meet the instruction to avoid a coarse-grained global lock and to allow concurrent gets on different keys. That is a major miss on the central requirement.

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer A provides a correct LRU cache implementation with proper doubly-linked list and hash map structure, correct eviction logic, and appropriate edge case handling. The code is clean and readable. However, it fundamentally fails the core concurrency requirement by using a single RLock for all operations, which the author themselves acknowledges is not acceptable per the prompt. The demonstration is present but the assertions are minimal. This is a solid baseline implementation that would need significant redesign to meet the fine-grained concurrency requirements.

View Score Details ▼

Correctness

Weight 35%

Answer A implements a correct LRU cache with proper doubly-linked list and hash map. The eviction logic, get/put/delete operations, and edge case handling (ValueError for non-positive capacity) are all correct. However, it uses a single RLock for everything, which the author themselves acknowledges does not satisfy the fine-grained concurrency requirement. The LRU semantics are correct but the concurrency model is naive.

Completeness

Weight 20%

Answer A includes all required operations, edge case handling, and a demonstration with multiple threads. However, it explicitly fails to implement the required fine-grained concurrency (uses a single global RLock), which is a core requirement. The demonstration is present but the assertions are minimal and the integrity check is superficial.

Code Quality

Weight 20%

Answer A is clean and readable with good structure. The use of dummy head/tail sentinels is appropriate. However, using RLock instead of Lock is unnecessary here and the comment acknowledges the implementation doesn't meet the stated requirements. The code is well-organized but the concurrency design is a known shortcoming.

Practical Value

Weight 15%

Answer A's single-lock approach has limited practical value for high-concurrency scenarios. It would work correctly but would be a bottleneck under load. The author acknowledges this limitation. For production use, the implementation would need significant redesign.

Instruction Following

Weight 10%

Answer A follows most instructions but explicitly fails the core requirement of going beyond a single global lock. The prompt states 'a single coarse-grained lock around everything is not acceptable' and Answer A uses exactly that. The author acknowledges this in comments, which shows awareness but doesn't fix the problem. The demonstration and edge case handling are present.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer A provides a correct, thread-safe implementation of an LRU cache using a standard dictionary and a doubly linked list. Its code is clean and easy to understand. However, it fundamentally fails the core concurrency requirement of the prompt by using a single global lock for all operations. While it correctly acknowledges this limitation in its explanation, this design choice makes it a non-compliant and much simpler solution than what was requested. The demonstration code is functional but its integrity checks are very basic.

View Score Details ▼

Correctness

Weight 35%

The LRU logic and thread safety are correct due to the single global lock. However, it's not a correct implementation of the *concurrent* design requested, as it serializes all operations, failing to meet the performance goal.

Completeness

Weight 20%

The answer provides all the requested components: the class implementation, an explanation, and a demonstration. However, the core implementation is incomplete with respect to the prompt's concurrency requirements.

Code Quality

Weight 20%

The code is clean, well-commented, and follows standard Python idioms. The implementation of the dictionary and doubly linked list is straightforward and correct.

Practical Value

Weight 15%

A single-lock LRU cache has limited practical value in high-concurrency applications, as the lock quickly becomes a bottleneck. It serves as a basic textbook example but is not a high-performance solution.

Instruction Following

Weight 10%

The answer fails to follow a key negative constraint in the prompt: "a single coarse-grained lock around everything is not acceptable." By using a single RLock for all operations, it directly violates this instruction, which was central to the task's difficulty.

Answer B: OpenAI GPT-5.2

"""Concurrent LRU Cache (strict LRU) with minimized contention.

Concurrency strategy (brief):

The cache uses a dict for key->Node and a doubly-linked list for global LRU order.
To reduce contention for key-value operations, the key space is striped across multiple locks.
Updates to a particular key (insert/update/delete/read of the mapping and node fields) use
only that key’s stripe lock.
To preserve strict global LRU semantics, all mutations of the LRU list are serialized with
a dedicated lru_lock. This avoids races that could corrupt the list and ensures there is a
single total order of “touch”...

Show Full Answer ▼

"""Concurrent LRU Cache (strict LRU) with minimized contention.

Concurrency strategy (brief):

The cache uses a dict for key->Node and a doubly-linked list for global LRU order.
To reduce contention for key-value operations, the key space is striped across multiple locks.
Updates to a particular key (insert/update/delete/read of the mapping and node fields) use
only that key’s stripe lock.
To preserve strict global LRU semantics, all mutations of the LRU list are serialized with
a dedicated lru_lock. This avoids races that could corrupt the list and ensures there is a
single total order of “touch” operations.
Eviction requires coordination: under lru_lock we select the current tail (least recently
used) and then acquire that key’s stripe lock to verify it is still the same node and remove
it from the map. This avoids evicting a concurrently-updated key incorrectly.

Why this is correct:

The LRU list is only mutated under lru_lock, so it cannot become corrupted and each touch
(get/put for existing keys) is totally ordered, yielding strict LRU.
The map entry and node’s value are protected by the stripe lock for that key, preventing
concurrent updates/deletes for the same key from corrupting state.
Evictions are based on the actual tail at the moment eviction is decided under lru_lock; if
the candidate node changes due to a concurrent touch/delete, eviction retries.

Note: Python does not provide true lock-free primitives for complex shared structures; this
implementation avoids a single global lock around everything by separating concerns:

Key-level operations are striped.
Only list-order maintenance uses a single lock.

A demonstration is included in main.
"""

from future import annotations

import threading
import time
import random
from dataclasses import dataclass
from typing import Any, Optional, Dict, List

@dataclass
class _Node:
key: Any
value: Any
prev: Optional["_Node"] = None
next: Optional["_Node"] = None
alive: bool = True # set False when deleted/evicted

class ConcurrentLRUCache:
def init(self, capacity: int, stripes: int = 64):
if capacity <= 0:
raise ValueError("capacity must be positive")
if stripes <= 0:
raise ValueError("stripes must be positive")
self.capacity = int(capacity)
self._map: Dict[Any, _Node] = {}

    # Sentinel head/tail for doubly-linked list (MRU at head.next, LRU at tail.prev)
    self._head = _Node("<HEAD>", None)
    self._tail = _Node("<TAIL>", None)
    self._head.next = self._tail
    self._tail.prev = self._head

    # Serializes list operations and size/eviction decisions
    self._lru_lock = threading.Lock()

    # Striped locks for key-level map/node operations
    self._stripes_n = int(stripes)
    self._stripes: List[threading.Lock] = [threading.Lock() for _ in range(self._stripes_n)]

    # Size tracked under lru_lock
    self._size = 0

def _stripe_lock(self, key: Any) -> threading.Lock:
    return self._stripes[hash(key) % self._stripes_n]

# ----- Doubly-linked list helpers (require holding self._lru_lock) -----
def _unlink(self, node: _Node) -> None:
    p = node.prev
    n = node.next
    if p is not None:
        p.next = n
    if n is not None:
        n.prev = p
    node.prev = None
    node.next = None

def _link_front(self, node: _Node) -> None:
    first = self._head.next
    node.prev = self._head
    node.next = first
    self._head.next = node
    if first is not None:
        first.prev = node

def _move_to_front(self, node: _Node) -> None:
    # node must be in the list
    self._unlink(node)
    self._link_front(node)

def _current_lru_node(self) -> Optional[_Node]:
    # returns actual LRU node (tail.prev) or None if empty
    n = self._tail.prev
    if n is None or n is self._head:
        return None
    return n

# ----- Public API -----
def get(self, key: Any) -> Any:
    lock = self._stripe_lock(key)
    with lock:
        node = self._map.get(key)
        if node is None or not node.alive:
            return None
        value = node.value
    # Move-to-front requires the list lock; do it outside stripe lock to reduce contention.
    with self._lru_lock:
        # Re-check membership under stripe lock to avoid moving a deleted/replaced node.
        with lock:
            cur = self._map.get(key)
            if cur is not node or cur is None or not cur.alive:
                return value  # value was valid at time of read; touch lost due to race.
        # Under lru_lock, we can safely move this node.
        self._move_to_front(node)
    return value

def put(self, key: Any, value: Any) -> None:
    lock = self._stripe_lock(key)
    # First, check if key exists and update under stripe lock.
    with lock:
        node = self._map.get(key)
        if node is not None and node.alive:
            node.value = value
            # Touch (move to front) under lru_lock.
            # Release stripe lock quickly? We can keep it short: update done.
            pass
        else:
            node = None

    if node is not None:
        with self._lru_lock:
            # Ensure still present and same node
            with lock:
                cur = self._map.get(key)
                if cur is node and cur is not None and cur.alive:
                    self._move_to_front(node)
        return

    # Insert new node
    new_node = _Node(key, value)
    with self._lru_lock:
        # Acquire stripe lock while holding lru_lock to ensure consistent insertion+link.
        with lock:
            existing = self._map.get(key)
            if existing is not None and existing.alive:
                # Someone inserted concurrently; update that instead.
                existing.value = value
                # Touch it
                self._move_to_front(existing)
                return
            # Insert mapping and link
            self._map[key] = new_node
            self._link_front(new_node)
            self._size += 1

        # Evict while over capacity
        while self._size > self.capacity:
            if not self._evict_one_locked():
                # If eviction failed due to races, retry
                continue

def _evict_one_locked(self) -> bool:
    """Evict a single LRU node.
    Requires holding self._lru_lock.
    Returns True if an eviction happened, False if needs retry.
    """
    victim = self._current_lru_node()
    if victim is None:
        return True
    vkey = victim.key
    vlock = self._stripe_lock(vkey)
    with vlock:
        cur = self._map.get(vkey)
        if cur is not victim or cur is None or not cur.alive:
            # victim changed or already removed; retry
            return False
        # Remove from map first
        del self._map[vkey]
        victim.alive = False
        # Now remove from list (we hold lru_lock)
        self._unlink(victim)
        self._size -= 1
    return True

def delete(self, key: Any) -> bool:
    lock = self._stripe_lock(key)
    # Acquire lru_lock first to avoid deadlocks with eviction/insert path.
    with self._lru_lock:
        with lock:
            node = self._map.get(key)
            if node is None or not node.alive:
                return False
            del self._map[key]
            node.alive = False
            # Unlink from LRU list
            self._unlink(node)
            self._size -= 1
            return True

def __len__(self) -> int:
    with self._lru_lock:
        return self._size

def _assert_invariants(self) -> None:
    """Debug/check routine: verifies internal consistency."""
    with self._lru_lock:
        # Check list structure and count nodes
        count = 0
        seen = set()
        n = self._head.next
        prev = self._head
        while n is not None and n is not self._tail:
            assert n.prev is prev
            assert n.alive
            assert n.key not in seen
            seen.add(n.key)
            count += 1
            prev = n
            n = n.next
        assert n is self._tail
        assert self._tail.prev is prev
        assert count == self._size
        # Map and list agree (best-effort; need stripe locks to be perfect).
        # Under lru_lock, map may be changing under stripe locks; so we only do a conservative check:
        # every list node must exist in map.
        for k in seen:
            lock = self._stripe_lock(k)
            with lock:
                m = self._map.get(k)
                assert m is not None and m.alive and m.key == k

def _demo() -> None:
cap = 32
cache = ConcurrentLRUCache(capacity=cap, stripes=32)

stop = threading.Event()
errors = []

keys = [f"k{i}" for i in range(64)]

def worker(tid: int) -> None:
    rnd = random.Random(12345 + tid)
    try:
        while not stop.is_set():
            op = rnd.random()
            k = rnd.choice(keys)
            if op < 0.50:
                cache.get(k)
            elif op < 0.85:
                cache.put(k, (tid, rnd.randint(0, 10_000)))
            else:
                cache.delete(k)

            # Occasionally assert size not exceeding capacity
            if rnd.random() < 0.01:
                assert len(cache) <= cap
    except Exception as e:
        errors.append(e)
        stop.set()

threads = [threading.Thread(target=worker, args=(i,), daemon=True) for i in range(12)]
for t in threads:
    t.start()

# Run a bit
time.sleep(1.5)

# Stop and join
stop.set()
for t in threads:
    t.join(timeout=2.0)

# No worker errors
if errors:
    raise errors[0]

# Final invariant checks
assert len(cache) <= cap
cache._assert_invariants()

# Basic sanity: repeated put/get on same key from multiple threads should not corrupt
cache.put("sentinel", 1)
assert cache.get("sentinel") == 1
assert cache.delete("sentinel") is True
assert cache.get("sentinel") is None

if name == "main":
_demo()
print("Demo completed; cache invariants hold.")

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer B provides a substantially stronger concurrent design using striped per-key locks plus a dedicated LRU list lock, which is much closer to the requested fine-grained approach. The implementation is complete, validates capacity, supports get/put/delete, includes eviction coordination, and provides a threaded demonstration with invariant checks. The explanation of the concurrency strategy is clear and technically grounded. One caveat is that get can return a value even if the recency update is lost due to a race, which makes the strict-LRU claim somewhat debatable under highly concurrent interleavings, but overall it is far more aligned with the task than Answer A.

View Score Details ▼

Correctness

Weight 35%

The implementation is carefully designed for concurrent access, with explicit coordination for eviction and list mutation. It preserves internal consistency well and includes invariant checks. The main weakness is that get may return a value even when the recency update is dropped after a concurrent delete or replacement, which weakens the claim of strict LRU semantics under races, but overall the core behavior is strong.

Completeness

Weight 20%

It delivers a full module with the requested operations, capacity validation, concurrency explanation, and a demonstration with multiple threads performing mixed operations. It also includes internal invariant checks and edge-case coverage such as deletion and repeated access. This is close to fully complete for the prompt.

Code Quality

Weight 20%

The code is well-structured, uses type hints and dataclasses appropriately, separates list helpers from public API, and documents lock expectations clearly. The concurrency design is more sophisticated while remaining readable. Minor complexity and subtle race-handling tradeoffs prevent an even higher score.

Practical Value

Weight 15%

This is much more practical for real concurrent workloads because it reduces contention with striped locks and centralizes only the list ordering logic. The demo and invariant checks improve confidence. It is still constrained by Python threading and a global LRU lock for list order, but it is meaningfully useful.

Instruction Following

Weight 10%

It follows the prompt closely: fixed-capacity cache, required methods, ValueError on invalid capacity, explanation of concurrency strategy, and a threaded demo with mixed operations and assertions. The design directly targets minimized contention and strict LRU maintenance more faithfully than A.

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer B provides a genuinely fine-grained concurrent LRU cache with striped locking for key-level operations and a dedicated lru_lock for list-order maintenance. The implementation correctly handles concurrent operations, eviction races (via retry and alive flag), and all edge cases. The concurrency strategy explanation is clear and accurate. The demonstration is robust with multiple threads, meaningful assertions, and an invariant checker. The code quality is high with good use of type hints, dataclasses, and clear separation of concerns. Minor theoretical race conditions in get() are mitigated by re-checking under both locks.

View Score Details ▼

Correctness

Weight 35%

Answer B implements correct LRU semantics with striped locking and a dedicated lru_lock. The eviction logic handles retry on race conditions, the alive flag prevents stale node issues, and the get/put/delete operations are carefully coordinated. The _assert_invariants method provides additional correctness verification. There is a subtle issue in get() where the stripe lock is released before acquiring lru_lock, which could theoretically allow a race, but the re-check under both locks mitigates this. Overall correctness is strong.

Completeness

Weight 20%

Answer B is highly complete: it implements all required operations, handles edge cases (ValueError for non-positive capacity and stripes), uses striped locking with a dedicated lru_lock, includes an _assert_invariants method, and provides a robust multi-threaded demonstration with meaningful assertions. The concurrency strategy explanation is thorough and accurate.

Code Quality

Weight 20%

Answer B is well-structured with clear separation of concerns. The use of dataclass for _Node, type hints, and docstrings improves readability. The _evict_one_locked helper and _assert_invariants method show good design. The alive flag pattern is a clean way to handle concurrent deletion/eviction. Lock ordering (lru_lock before stripe lock) is consistently maintained to prevent deadlocks.

Practical Value

Weight 15%

Answer B has genuine practical value with its striped locking approach. The 64-stripe default reduces contention significantly for key-level operations. The separation of list-order maintenance (lru_lock) from key-level operations (stripe locks) is a sound design that would perform well under concurrent workloads. The configurable stripes parameter adds flexibility.

Instruction Following

Weight 10%

Answer B follows all instructions: it implements fine-grained concurrency with striped locks, provides a clear explanation of the concurrency strategy, handles all edge cases, includes a multi-threaded demonstration with assertions, and raises ValueError for invalid capacity. The implementation genuinely goes beyond a single global lock as required.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer B presents a sophisticated and high-quality solution that correctly implements a concurrent LRU cache with fine-grained locking. It successfully uses a striped locking mechanism for key-value operations and a separate lock for list manipulations to maintain strict LRU order, directly addressing the prompt's requirement to minimize contention. The code is well-structured, the explanation of the complex concurrency strategy is clear and accurate, and the demonstration includes a robust invariant-checking method. This answer demonstrates a deep understanding of both data structures and concurrent programming challenges.

View Score Details ▼

Correctness

Weight 35%

The implementation is not only correct in its LRU logic but also appears to correctly handle the complex race conditions inherent in its fine-grained locking strategy. The logic for coordinating between stripe locks and the LRU lock to maintain strict LRU order is sound and well-implemented.

Completeness

Weight 20%

100

This answer is perfectly complete. It delivers a fully-featured class that meets all functional and concurrency requirements, a detailed and accurate explanation of its advanced strategy, and a robust multi-threaded demonstration with strong integrity checks.

Code Quality

Weight 20%

The code quality is excellent. It is well-structured with clear helper methods, uses a dataclass for nodes, and includes a valuable `_assert_invariants` method for debugging and verification. The code is complex but remains readable due to good organization and comments.

Practical Value

Weight 15%

This implementation has high practical value. A concurrent LRU cache with striped locking is a genuinely useful data structure for building high-performance, multi-threaded systems. This design is directly applicable to real-world problems.

Instruction Following

Weight 10%

100

The answer perfectly follows all instructions. It implements the required methods, handles edge cases, provides an explanation and demo, and most importantly, adheres to the constraint of avoiding a single global lock by implementing a sophisticated fine-grained concurrency model.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Google Gemini 2.5 Flash-Lite

Winning Votes

0 / 3

Average Score

View this answer

Winner OpenAI GPT-5.2

Winning Votes

3 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models Google Gemini 2.5 Pro

Gemini 2.5 Flash-Lite 66

GPT-5.2 Winner 95

Why This Side Won

Answer B is the clear winner because it successfully addresses the primary challenge of the task: implementing a concurrent LRU cache with a fine-grained locking strategy to minimize contention. Answer A fails this critical requirement by using a single global lock, which the prompt explicitly forbids. Answer B's use of striped locking combined with a dedicated list lock is a much more advanced and practical solution, demonstrating superior knowledge of concurrent programming. Furthermore, Answer B's code quality, explanation, and testing are all more thorough and robust.

Judge Models Anthropic Claude Sonnet 4.6

Gemini 2.5 Flash-Lite 61

GPT-5.2 Winner 83

Why This Side Won

Answer B wins decisively because it actually implements the required fine-grained concurrency strategy with striped locking, which is the central requirement of this task. Answer A explicitly uses a single global lock and acknowledges this violates the prompt's requirements. Beyond concurrency design, Answer B also has a more robust demonstration, better invariant checking, and higher overall code quality. The task specifically states that a single coarse-grained lock is not acceptable, making Answer A's approach a fundamental failure of the core requirement.

Judge Models OpenAI GPT-5.4

Gemini 2.5 Flash-Lite 62

GPT-5.2 Winner 82

Why This Side Won

Answer B wins because it meaningfully addresses the core challenge of the task: reducing contention while preserving thread safety and LRU behavior. It uses striped locking and a separate LRU lock instead of a single global lock, includes a clear correctness rationale, handles eviction races more carefully, and provides stronger concurrent test coverage with invariant assertions. Answer A is simpler and mostly correct functionally, but its single coarse-grained lock directly fails a major requirement of the prompt.

Implement a Lock-Free Concurrent LRU Cache

Task Overview

Task Prompt

Answers

Answer A: Google Gemini 2.5 Flash-Lite

Answer B: OpenAI GPT-5.2

Comparison Summary

Judging Results

Related Tasks

Design a URL Shortening Service

Explain Database Indexing to a Junior Developer

Persuade a Skeptical School Board to Adopt a Later Start Time for High Schools

Explain the Paradox of the Ship of Theseus in Philosophy of Identity

Evaluating Transportation Options for a Mid-Size City

Responding to a Friend Who Just Lost Their Job

Innovative Uses for Retired Electric Vehicle Batteries

Write a Humorous Office Email Chain About a Missing Stapler

Related Links