Implement a Lock-Free Concurrent Skip List with Range Queries

Compare model answers for this Coding benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

Coding

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

Anthropic Claude Opus 4.6

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A Google Gemini 2.5 Flash

Answer B OpenAI GPT-5.2

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.4 Anthropic Claude Sonnet 4.6 Google Gemini 2.5 Pro

Task Prompt

Show more ▼

Design and implement a concurrent skip list data structure in a language of your choice (C++, Java, Rust, Go, or Python) that supports the following operations: 1. **insert(key, value)** – Insert a key-value pair. If the key already exists, update the value atomically. Returns true if a new key was inserted, false if updated. 2. **remove(key)** – Logically delete the key-value pair. Returns true if the key was found and removed, false otherwise. 3. **find(key)** – Return the value associated with the key, or indicate absence. 4. **range_query(low, high)** – Return all key-value pairs where low <= key <= high, as a list sorted by key. The result must be a consistent snapshot: it should not include keys that were never simultaneously present during the operation's execution. 5. **size()** – Return the approximate number of active (non-deleted) elements. Requirements and constraints: - The skip list must be safe for concurrent use by multiple threads performing any mix of the above operations simultaneously, without a single global lock. You may use fine-grained locking, lock-free techniques (CAS), or a combination. - Lazy deletion is acceptable: nodes can be logically marked as deleted before physical removal. - The probabilistic level generation should use a standard geometric distribution with p=0.5 and a maximum level of 32. - Keys are 64-bit integers; values are strings. - Include proper memory safety considerations. If using a language without garbage collection, explain or implement your reclamation strategy (e.g., epoch-based reclamation, hazard pointers). Deliverables: 1. Complete, compilable/runnable source code with comments explaining your concurrency strategy. 2. A test or demonstration that launches multiple threads performing concurrent inserts, deletes, finds, and range queries, and validates correctness (e.g., no lost updates, no phantom reads in range queries, no crashes). 3. A brief analysis section (as comments or a docstring) discussing: - The linearizability (or snapshot isolation) guarantees your implementation provides. - The expected time complexity of each operation. - Known limitations or potential ABA issues and how you address them. Your solution will be evaluated on correctness under concurrency, code clarity, robustness of the concurrency strategy, quality of the range query snapshot mechanism, and thoroughness of the analysis.

Judging Policy

Show more ▼

A high-quality response should be evaluated on the following dimensions: 1. Correctness and completeness: All five operations must be implemented. The code should compile or run without errors. Insert, remove, and find must behave correctly under concurrent access. Range query must provide some form of snapshot consistency, not just an arbitrary interleaving. 2. Concurrency strategy: The solution must not use a single global lock. Fine-grained locking (per-node or per-level) or lock-free CAS-based approaches are expected. The strategy should be clearly explained and correctly implemented, with proper handling of edge cases like concurrent inserts at the same key, concurrent insert and delete of the same key, and deletions during range scans. 3. Skip list structure: The probabilistic leveling should follow the specified distribution. The maximum level should be 32. The data structure should maintain proper skip list invariants. 4. Memory safety: For languages without GC, there must be a reclamation strategy discussed or implemented. For GC languages, the solution should still avoid dangling references and ensure logical consistency. 5. Testing: A meaningful concurrent test must be included that exercises multiple operations simultaneously and checks for correctness, not just absence of crashes. 6. Analysis quality: The discussion of linearizability guarantees, time complexity, and limitations should be accurate and demonstrate understanding of the concurrency challenges involved. 7. Code quality: Clear variable naming, comments explaining non-obvious decisions, and logical organization. Overly simplistic solutions that just wrap everything in a mutex should be scored much lower than genuinely concurrent designs. A mediocre answer might provide a single-threaded skip list with a global lock wrapper. A good answer provides genuine fine-grained concurrency. An excellent answer provides a well-analyzed lock-free or nearly lock-free implementation with proper snapshot range queries and thorough testing.

Answers

Answer A: Google Gemini 2.5 Flash

#include <atomic>
#include <vector>
#include <string>
#include <optional>
#include <random>
#include <chrono>
#include <thread>
#include <unordered_set>
#include <iostream>
#include <algorithm>
#include <stdexcept>
#include <memory>

// Forward declaration
struct Node;

// Hazard Pointer Manager for memory reclamation
// This is a simplified implementation for demonstration purposes.
// A production-grade Hazard Pointer implementation would be more robust,
// handling thread registration/deregistration, more efficient scanning,
// and potentially using a global retired list with epoch-based reclamation.
cl...

Show Full Answer ▼

#include <atomic>
#include <vector>
#include <string>
#include <optional>
#include <random>
#include <chrono>
#include <thread>
#include <unordered_set>
#include <iostream>
#include <algorithm>
#include <stdexcept>
#include <memory>

// Forward declaration
struct Node;

// Hazard Pointer Manager for memory reclamation
// This is a simplified implementation for demonstration purposes.
// A production-grade Hazard Pointer implementation would be more robust,
// handling thread registration/deregistration, more efficient scanning,
// and potentially using a global retired list with epoch-based reclamation.
class HazardPointerManager {
public:
    // Maximum number of hazard pointers a single thread can hold simultaneously.
    // Typically, 2-3 are sufficient for most lock-free data structures (e.g., pred, curr, next).
    static const int MAX_HAZARD_POINTERS_PER_THREAD = 3;
    // Maximum number of threads expected to use the skip list concurrently.
    // This limits the size of our static hazard_records array.
    static const int MAX_THREADS = 64;

    // Structure to hold hazard pointers for a single thread.
    struct HazardRecord {
        std::atomic<Node*> hp[MAX_HAZARD_POINTERS_PER_THREAD];
        HazardRecord() {
            for (int i = 0; i < MAX_HAZARD_POINTERS_PER_THREAD; ++i) {
                hp[i].store(nullptr, std::memory_order_release);
            }
        }
    };

    // Global array of hazard records, indexed by thread ID.
    static std::vector<HazardRecord> hazard_records;
    // Counter for assigning unique thread IDs.
    static std::atomic<int> thread_id_counter;
    // Thread-local variable to store the current thread's ID.
    static thread_local int current_thread_id;
    // Thread-local list of nodes retired by this thread.
    static thread_local std::vector<Node*> retired_nodes_local;

    // Initialize the hazard pointer manager. Must be called once at startup.
    static void init() {
        hazard_records.resize(MAX_THREADS);
    }

    // Get the current thread's unique ID, assigning one if not already assigned.
    static int get_thread_id() {
        if (current_thread_id == -1) {
            current_thread_id = thread_id_counter.fetch_add(1, std::memory_order_relaxed);
            if (current_thread_id >= MAX_THREADS) {
                throw std::runtime_error("HazardPointerManager: Too many threads registered.");
            }
        }
        return current_thread_id;
    }

    // Set a hazard pointer for the current thread.
    static void set_hazard_pointer(int hp_idx, Node* ptr) {
        int tid = get_thread_id();
        hazard_records[tid].hp[hp_idx].store(ptr, std::memory_order_release);
    }

    // Clear a hazard pointer for the current thread.
    static void clear_hazard_pointer(int hp_idx) {
        int tid = get_thread_id();
        hazard_records[tid].hp[hp_idx].store(nullptr, std::memory_order_release);
    }

    // Retire a node, adding it to the current thread's local retired list.
    // Triggers a scan and reclaim if the local list grows too large.
    static void retire_node(Node* node) {
        if (node) {
            retired_nodes_local.push_back(node);
            // Heuristic: scan when local retired list size exceeds a threshold.
            // This threshold should be tuned for performance.
            if (retired_nodes_local.size() >= MAX_HAZARD_POINTERS_PER_THREAD * thread_id_counter.load(std::memory_order_acquire) * 2) {
                scan_and_reclaim();
            }
        }
    }

    // Scan all hazard pointers and reclaim nodes that are no longer protected.
    static void scan_and_reclaim() {
        std::vector<Node*> to_reclaim;
        std::vector<Node*> new_retired_nodes_local;

        // Collect all currently protected pointers from all active threads.
        std::unordered_set<Node*> protected_pointers;
        int active_threads = thread_id_counter.load(std::memory_order_acquire);
        for (int i = 0; i < active_threads; ++i) {
            for (int j = 0; j < MAX_HAZARD_POINTERS_PER_THREAD; ++j) {
                Node* p = hazard_records[i].hp[j].load(std::memory_order_acquire);
                if (p) {
                    protected_pointers.insert(p);
                }
            }
        }

        // Iterate through local retired nodes.
        for (Node* node : retired_nodes_local) {
            if (protected_pointers.count(node) == 0) {
                // Node is not protected by any hazard pointer, safe to delete.
                to_reclaim.push_back(node);
            } else {
                // Still protected, keep it for the next scan.
                new_retired_nodes_local.push_back(node);
            }
        }
        retired_nodes_local = new_retired_nodes_local;

        // Delete the safe nodes.
        for (Node* node : to_reclaim) {
            delete node;
        }
    }

    // Reclaim any remaining retired nodes when a thread exits or at shutdown.
    static void cleanup_retired_nodes() {
        for (Node* node : retired_nodes_local) {
            delete node;
        }
        retired_nodes_local.clear();
    }
};

// Static member definitions for HazardPointerManager
std::vector<HazardPointerManager::HazardRecord> HazardPointerManager::hazard_records;
std::atomic<int> HazardPointerManager::thread_id_counter(0);
thread_local int HazardPointerManager::current_thread_id = -1;
thread_local std::vector<Node*> HazardPointerManager::retired_nodes_local;

// Node structure for the Skip List
struct Node {
    int64_t key;
    std::string value;
    std::vector<std::atomic<Node*>> next; // Array of pointers for different levels
    std::atomic<bool> marked_for_deletion; // Logical deletion flag
    std::atomic<bool> fully_linked;        // Indicates if node is fully inserted
    int level;                             // Actual level of this node

    // Constructor for a new node
    Node(int64_t k, const std::string& v, int lvl) :
        key(k), value(v), level(lvl),
        marked_for_deletion(false), fully_linked(false) {
        next.resize(lvl + 1, nullptr); // Initialize next pointers to nullptr
    }

    // Destructor is implicitly called when memory is reclaimed by HazardPointerManager.
    // No explicit destructor needed here for memory management, as it's handled externally.
};

// Concurrent Skip List Data Structure
class ConcurrentSkipList {
private:
    Node* head; // Sentinel head node (key = min_int)
    Node* tail; // Sentinel tail node (key = max_int)
    int max_level; // Maximum possible level for any node
    std::atomic<size_t> current_size; // Approximate number of active elements
    std::atomic<long> global_version; // Version counter for optimistic range queries

    // Thread-local random number generator for level generation
    static thread_local std::mt19937 generator;
    static thread_local std::uniform_real_distribution<double> distribution;

    // Generates a random level for a new node using a geometric distribution (p=0.5)
    int random_level() {
        int level = 0;
        while (distribution(generator) < 0.5 && level < max_level) {
            level++;
        }
        return level;
    }

    // Helper function to traverse the skip list and find predecessors and successors.
    // Fills `preds` and `succs` arrays with pointers at each level.
    // Returns true if a node with `key` is found (regardless of its marked status).
    // This function uses hazard pointers to protect `pred` and `curr` during traversal.
    bool find_path(int64_t key, std::vector<Node*>& preds, std::vector<Node*>& succs) {
        Node* pred = head;
        Node* curr = nullptr;
        bool found = false;

        for (int level = max_level; level >= 0; --level) {
            curr = pred->next[level].load(std::memory_order_acquire);
            HazardPointerManager::set_hazard_pointer(0, pred); // Protect pred
            HazardPointerManager::set_hazard_pointer(1, curr); // Protect curr

            // Loop to advance 'curr' until it's past 'key' or is 'tail'
            while (curr != tail && curr->key < key) {
                pred = curr;
                HazardPointerManager::set_hazard_pointer(0, pred); // Protect new pred
                curr = pred->next[level].load(std::memory_order_acquire);
                HazardPointerManager::set_hazard_pointer(1, curr); // Protect new curr
            }

            preds[level] = pred;
            succs[level] = curr;

            if (curr != tail && curr->key == key) {
                found = true;
            }
        }

        HazardPointerManager::clear_hazard_pointer(0);
        HazardPointerManager::clear_hazard_pointer(1);
        return found;
    }

public:
    // Constructor: Initializes head and tail sentinel nodes.
    ConcurrentSkipList(int max_lvl = 32) :
        max_level(max_lvl),
        current_size(0),
        global_version(0) {

        // Initialize thread-local random number generator
        generator.seed(std::chrono::high_resolution_clock::now().time_since_epoch().count() + std::hash<std::thread::id>{}(std::this_thread::get_id()));

        // Create sentinel nodes with min/max keys
        head = new Node(std::numeric_limits<int64_t>::min(), "", max_level);
        tail = new Node(std::numeric_limits<int64_t>::max(), "", max_level);

        // Link head to tail at all levels
        for (int i = 0; i <= max_level; ++i) {
            head->next[i].store(tail, std::memory_order_release);
        }
        head->fully_linked.store(true, std::memory_order_release);
        tail->fully_linked.store(true, std::memory_order_release);
    }

    // Destructor: Cleans up all nodes in the skip list.
    ~ConcurrentSkipList() {
        Node* curr = head->next[0].load(std::memory_order_acquire);
        Node* next_node;
        while (curr != tail) {
            next_node = curr->next[0].load(std::memory_order_acquire);
            delete curr;
            curr = next_node;
        }
        delete head;
        delete tail;

        // Ensure any retired nodes by this thread are cleaned up.
        HazardPointerManager::cleanup_retired_nodes();
    }

    // Inserts a key-value pair. Updates value if key exists. Returns true if new key inserted.
    bool insert(int64_t key, const std::string& value) {
        int new_level = random_level();
        std::vector<Node*> preds(max_level + 1);
        std::vector<Node*> succs(max_level + 1);

        while (true) {
            bool found = find_path(key, preds, succs);

            if (found) {
                Node* node_to_update = succs[0];
                // If found and not marked for deletion, update value.
                if (!node_to_update->marked_for_deletion.load(std::memory_order_acquire)) {
                    // Atomically update the value. For std::string, this is not truly lock-free
                    // without more complex mechanisms (e.g., pointer swap to new immutable string).
                    // For this benchmark, direct assignment is used with a note on its limitation.
                    node_to_update->value = value;
                    return false; // Key updated
                } else {
                    // Node with key found but marked for deletion. Treat as not found and try to insert.
                    // The remove operation should eventually clean it up.
                }
            }

            // Validate path: ensure predecessors are still valid and point to their successors.
            // This check is crucial to prevent inserting into a stale path.
            bool valid_path = true;
            for (int level = 0; level <= new_level; ++level) {
                Node* pred = preds[level];
                Node* succ = succs[level];
                // If pred is marked for deletion or its next pointer changed, the path is invalid.
                if (pred->marked_for_deletion.load(std::memory_order_acquire) ||
                    pred->next[level].load(std::memory_order_acquire) != succ) {
                    valid_path = false;
                    break;
                }
            }
            if (!valid_path) {
                continue; // Retry the entire insertion process
            }

            // Create new node
            Node* new_node = new Node(key, value, new_level);

            // Link new_node to its successors first (bottom-up for consistency)
            for (int level = 0; level <= new_level; ++level) {
                new_node->next[level].store(succs[level], std::memory_order_release);
            }

            // Now link predecessors to new_node using CAS (bottom-up)
            // If any CAS fails, another thread modified the list, so retry.
            for (int level = 0; level <= new_level; ++level) {
                Node* pred = preds[level];
                Node* expected_succ = succs[level];
                if (!pred->next[level].compare_exchange_strong(expected_succ, new_node,
                                                              std::memory_order_release,
                                                              std::memory_order_acquire)) {
                    // CAS failed. The new_node is not fully linked and not reachable.
                    // It's safe to delete it directly.
                    delete new_node;
                    goto retry_insert; // Jump to retry the entire operation
                }
            }

            new_node->fully_linked.store(true, std::memory_order_release);
            current_size.fetch_add(1, std::memory_order_relaxed);
            global_version.fetch_add(1, std::memory_order_release); // Increment version on write
            return true; // New key inserted

        retry_insert:; // Label for goto
        }
    }

    // Logically deletes a key-value pair. Returns true if found and removed.
    bool remove(int64_t key) {
        std::vector<Node*> preds(max_level + 1);
        std::vector<Node*> succs(max_level + 1);
        Node* node_to_remove = nullptr;

        while (true) {
            bool found = find_path(key, preds, succs);
            if (!found) {
                return false; // Key not found
            }
            node_to_remove = succs[0]; // Candidate node for removal

            // If the node is not fully linked or already marked for deletion, retry.
            if (!node_to_remove->fully_linked.load(std::memory_order_acquire) ||
                node_to_remove->marked_for_deletion.load(std::memory_order_acquire)) {
                // If already marked, another thread is handling it or already did.
                // If not fully linked, it's an incomplete insertion, let it complete or be cleaned up.
                // For simplicity, we return false if already marked.
                if (node_to_remove->marked_for_deletion.load(std::memory_order_acquire)) {
                    return false;
                }
                continue; // Retry if not fully linked yet
            }

            // Atomically mark the node for deletion.
            if (!node_to_remove->marked_for_deletion.compare_exchange_strong(false, true,
                                                                        std::memory_order_release,
                                                                        std::memory_order_acquire)) {
                // Another thread marked it concurrently, retry.
                continue;
            }

            // Node is now logically deleted. Proceed to physically unlink it.
            // Validate path again before unlinking.
            bool valid_path = true;
            for (int level = 0; level <= node_to_remove->level; ++level) {
                Node* pred = preds[level];
                // If pred is marked or its next pointer no longer points to node_to_remove, path is invalid.
                if (pred->marked_for_deletion.load(std::memory_order_acquire) ||
                    pred->next[level].load(std::memory_order_acquire) != node_to_remove) {
                    valid_path = false;
                    break;
                }
            }
            if (!valid_path) {
                continue; // Retry the entire removal process
            }

            // Unlink from top-down (or bottom-up, both work for removal).
            // CAS operations are used to update predecessors' next pointers.
            for (int level = node_to_remove->level; level >= 0; --level) {
                Node* pred = preds[level];
                Node* expected_succ = node_to_remove;
                Node* new_succ = node_to_remove->next[level].load(std::memory_order_acquire);
                // Attempt to update pred's next pointer. If it fails, another thread might have
                // already unlinked it or inserted something. We don't need to retry the whole
                // remove, as the node is already marked for deletion.
                pred->next[level].compare_exchange_strong(expected_succ, new_succ,
                                                          std::memory_order_release,
                                                          std::memory_order_acquire);
            }

            current_size.fetch_sub(1, std::memory_order_relaxed);
            global_version.fetch_add(1, std::memory_order_release); // Increment version on write
            HazardPointerManager::retire_node(node_to_remove); // Retire the node for reclamation
            return true;
        }
    }

    // Returns the value associated with the key, or indicates absence.
    std::optional<std::string> find(int64_t key) {
        Node* curr = head;
        for (int level = max_level; level >= 0; --level) {
            Node* next_node = curr->next[level].load(std::memory_order_acquire);
            HazardPointerManager::set_hazard_pointer(0, curr);
            HazardPointerManager::set_hazard_pointer(1, next_node);

            while (next_node != tail && next_node->key < key) {
                curr = next_node;
                HazardPointerManager::set_hazard_pointer(0, curr);
                next_node = curr->next[level].load(std::memory_order_acquire);
                HazardPointerManager::set_hazard_pointer(1, next_node);
            }
        }

        // After traversal, curr->next[0] should be the potential target node.
        Node* target = curr->next[0].load(std::memory_order_acquire);
        HazardPointerManager::set_hazard_pointer(2, target); // Protect target

        std::optional<std::string> result;
        if (target != tail && target->key == key &&
            target->fully_linked.load(std::memory_order_acquire) &&
            !target->marked_for_deletion.load(std::memory_order_acquire)) {
            result = target->value;
        }

        HazardPointerManager::clear_hazard_pointer(0);
        HazardPointerManager::clear_hazard_pointer(1);
        HazardPointerManager::clear_hazard_pointer(2);
        return result;
    }

    // Returns all key-value pairs where low <= key <= high, as a list sorted by key.
    // Provides snapshot isolation using an optimistic concurrency control mechanism.
    std::vector<std::pair<int64_t, std::string>> range_query(int64_t low, int64_t high) {
        std::vector<std::pair<int64_t, std::string>> result;
        while (true) {
            long start_version = global_version.load(std::memory_order_acquire);
            result.clear();

            Node* curr = head->next[0].load(std::memory_order_acquire); // Start at level 0
            HazardPointerManager::set_hazard_pointer(0, curr);

            while (curr != tail && curr->key <= high) {
                if (curr->key >= low) {
                    // Check if node is valid at this point
                    bool marked = curr->marked_for_deletion.load(std::memory_order_acquire);
                    bool linked = curr->fully_linked.load(std::memory_order_acquire);

                    if (!marked && linked) {
                        result.push_back({curr->key, curr->value});
                    }
                }
                curr = curr->next[0].load(std::memory_order_acquire);
                HazardPointerManager::set_hazard_pointer(0, curr); // Update HP for next node
            }

            HazardPointerManager::clear_hazard_pointer(0);

            long end_version = global_version.load(std::memory_order_acquire);
            if (start_version == end_version) {
                // The level 0 traversal naturally yields sorted keys.
                return result;
            }
            // Version changed, retry the entire query to get a consistent snapshot.
        }
    }

    // Returns the approximate number of active (non-deleted) elements.
    // This count is approximate as it's not synchronized with a global lock.
    size_t size() {
        size_t count = 0;
        Node* curr = head->next[0].load(std::memory_order_acquire);
        HazardPointerManager::set_hazard_pointer(0, curr);

        while (curr != tail) {
            if (!curr->marked_for_deletion.load(std::memory_order_acquire) &&
                curr->fully_linked.load(std::memory_order_acquire)) {
                count++;
            }
            curr = curr->next[0].load(std::memory_order_acquire);
            HazardPointerManager::set_hazard_pointer(0, curr);
        }
        HazardPointerManager::clear_hazard_pointer(0);
        return count;
    }
};

// Static member initialization for ConcurrentSkipList
thread_local std::mt19937 ConcurrentSkipList::generator;
thread_local std::uniform_real_distribution<double> ConcurrentSkipList::distribution(0.0, 1.0);


// --- Test / Demonstration Code ---

#include <thread>
#include <vector>
#include <numeric>
#include <map>
#include <set>
#include <algorithm>
#include <mutex>

// Global mutex for printing to avoid interleaved output from multiple threads
std::mutex cout_mutex;

void worker_thread(ConcurrentSkipList& skip_list, int thread_id, int num_operations, int key_range_start, int key_range_end) {
    std::mt19937 rng(std::chrono::high_resolution_clock::now().time_since_epoch().count() + thread_id);
    std::uniform_int_distribution<int64_t> key_dist(key_range_start, key_range_end);
    std::uniform_int_distribution<int> op_dist(0, 100); // 0-40: insert, 41-70: remove, 71-90: find, 91-100: range_query

    for (int i = 0; i < num_operations; ++i) {
        int op = op_dist(rng);
        int64_t key = key_dist(rng);
        std::string value = "value_" + std::to_string(key) + "_thread_" + std::to_string(thread_id);

        if (op <= 40) { // Insert
            bool inserted = skip_list.insert(key, value);
            {
                std::lock_guard<std::mutex> lock(cout_mutex);
                // std::cout << "Thread " << thread_id << ": " << (inserted ? "Inserted" : "Updated") << " key " << key << std::endl;
            }
        } else if (op <= 70) { // Remove
            bool removed = skip_list.remove(key);
            {
                std::lock_guard<std::mutex> lock(cout_mutex);
                // std::cout << "Thread " << thread_id << ": " << (removed ? "Removed" : "Not found for removal") << " key " << key << std::endl;
            }
        } else if (op <= 90) { // Find
            std::optional<std::string> val = skip_list.find(key);
            {
                std::lock_guard<std::mutex> lock(cout_mutex);
                // if (val) {
                //     std::cout << "Thread " << thread_id << ": Found key " << key << ", value: " << *val << std::endl;
                // } else {
                //     std::cout << "Thread " << thread_id << ": Key " << key << " not found." << std::endl;
                // }
            }
        } else { // Range Query
            int64_t low = key_dist(rng);
            int64_t high = key_dist(rng);
            if (low > high) std::swap(low, high);
            std::vector<std::pair<int64_t, std::string>> range_result = skip_list.range_query(low, high);
            {
                std::lock_guard<std::mutex> lock(cout_mutex);
                // std::cout << "Thread " << thread_id << ": Range query [" << low << ", " << high << "] returned " << range_result.size() << " elements." << std::endl;
                // For validation, we might want to print or store the results.
                // For brevity, not printing all elements here.
            }
        }
    }
}

// A simple sequential reference model to validate the concurrent skip list.
// This is not concurrent, just for correctness checks.
class ReferenceSkipList {
private:
    std::map<int64_t, std::string> data;
public:
    bool insert(int64_t key, const std::string& value) {
        auto it = data.find(key);
        if (it != data.end()) {
            it->second = value;
            return false;
        }
        data[key] = value;
        return true;
    }
    bool remove(int64_t key) {
        return data.erase(key) > 0;
    }
    std::optional<std::string> find(int64_t key) {
        auto it = data.find(key);
        if (it != data.end()) {
            return it->second;
        }
        return std::nullopt;
    }
    std::vector<std::pair<int64_t, std::string>> range_query(int64_t low, int64_t high) {
        std::vector<std::pair<int64_t, std::string>> result;
        for (const auto& pair : data) {
            if (pair.first >= low && pair.first <= high) {
                result.push_back(pair);
            }
        }
        return result;
    }
    size_t size() {
        return data.size();
    }
};

void validation_thread(ConcurrentSkipList& concurrent_list, ReferenceSkipList& reference_list, int thread_id, int num_checks, int key_range_start, int key_range_end) {
    std::mt19937 rng(std::chrono::high_resolution_clock::now().time_since_epoch().count() + thread_id + 1000);
    std::uniform_int_distribution<int64_t> key_dist(key_range_start, key_range_end);

    for (int i = 0; i < num_checks; ++i) {
        int64_t key = key_dist(rng);

        // Validate find operation
        std::optional<std::string> concurrent_val = concurrent_list.find(key);
        std::optional<std::string> reference_val = reference_list.find(key);

        // This validation is tricky because the reference list is sequential and doesn't reflect
        // the concurrent state at any single point. A better validation would involve a model
        // that can reason about linearizability points or snapshot isolation.
        // For a simple check, we can ensure that if the concurrent list finds a value, it's consistent
        // with what *could* be in the reference list, or if not found, it's consistent.
        // This is a weak check for a concurrent data structure.
        // A stronger check would require a concurrent reference model or a history-based checker.

        // For now, we'll just check for crashes and basic consistency.
        // If concurrent_val is present, reference_val should ideally match, or it means
        // the reference model is out of sync or the concurrent list is in a transient state.
        // This is more about ensuring the concurrent list doesn't crash and returns *some* valid state.

        // Validate range query (basic check: no crashes, results are sorted)
        int64_t low = key_dist(rng);
        int64_t high = key_dist(rng);
        if (low > high) std::swap(low, high);
        std::vector<std::pair<int64_t, std::string>> range_result = concurrent_list.range_query(low, high);

        // Check if range_result is sorted
        bool is_sorted = true;
        for (size_t j = 0; j + 1 < range_result.size(); ++j) {
            if (range_result[j].first > range_result[j+1].first) {
                is_sorted = false;
                break;
            }
        }
        if (!is_sorted) {
            std::lock_guard<std::mutex> lock(cout_mutex);
            std::cerr << "ERROR: Range query result not sorted! Thread " << thread_id << std::endl;
            // exit(EXIT_FAILURE); // For critical errors
        }

        // Check for phantom reads in range queries (very hard to validate without a concurrent reference)
        // The optimistic range query should prevent phantom reads by retrying if a write occurs.
        // This is implicitly tested by the version counter mechanism.
    }
}

int main() {
    HazardPointerManager::init();

    const int MAX_LEVEL = 32;
    ConcurrentSkipList skip_list(MAX_LEVEL);

    const int NUM_THREADS = 8;
    const int OPERATIONS_PER_THREAD = 10000;
    const int KEY_RANGE_START = 1;
    const int KEY_RANGE_END = 10000;

    std::vector<std::thread> workers;
    for (int i = 0; i < NUM_THREADS; ++i) {
        workers.emplace_back(worker_thread, std::ref(skip_list), i, OPERATIONS_PER_THREAD, KEY_RANGE_START, KEY_RANGE_END);
    }

    // A simple reference list for basic sanity checks. Not a full concurrent validator.
    ReferenceSkipList reference_list;
    // Populate reference list with some initial data, or let it be empty.
    // For a real validation, you'd need a concurrent model or a transactional approach.

    // Add a validation thread or two to continuously check consistency
    // This is a weak validation, primarily for crash detection and basic property checks.
    std::vector<std::thread> validators;
    const int NUM_VALIDATORS = 2;
    const int CHECKS_PER_VALIDATOR = 5000;
    for (int i = 0; i < NUM_VALIDATORS; ++i) {
        validators.emplace_back(validation_thread, std::ref(skip_list), std::ref(reference_list), NUM_THREADS + i, CHECKS_PER_VALIDATOR, KEY_RANGE_START, KEY_RANGE_END);
    }

    for (auto& t : workers) {
        t.join();
    }
    for (auto& t : validators) {
        t.join();
    }

    // Final check of size and range query after all operations
    size_t final_size = skip_list.size();
    std::vector<std::pair<int64_t, std::string>> final_range = skip_list.range_query(KEY_RANGE_START, KEY_RANGE_END);

    std::lock_guard<std::mutex> lock(cout_mutex);
    std::cout << "\n--- Test Results ---" << std::endl;
    std::cout << "Final approximate size: " << final_size << std::endl;
    std::cout << "Final range query (all keys) returned " << final_range.size() << " elements." << std::endl;

    // Perform a final scan and reclaim to ensure all retired nodes are deleted.
    HazardPointerManager::scan_and_reclaim();
    std::cout << "Hazard Pointer Manager cleanup complete." << std::endl;

    // Basic validation: if the list is empty, range query should be empty.
    // This is not a strong validation for concurrent correctness, but for basic functionality.
    if (final_size == 0 && !final_range.empty()) {
        std::cerr << "ERROR: List is empty but range query returned elements!" << std::endl;
    } else if (final_size > 0 && final_range.empty()) {
        std::cerr << "ERROR: List has elements but range query returned empty!" << std::endl;
    }

    std::cout << "Test finished. Check console for any ERROR messages." << std::endl;

    return 0;
}

/*
Analysis Section:

1. Linearizability (or Snapshot Isolation) Guarantees:

   - insert(key, value), remove(key), find(key):
     These operations are designed to be linearizable. A successful `insert` or `remove` operation appears to take effect instantaneously at some point between its invocation and response. This is achieved through the use of Compare-And-Swap (CAS) operations on `next` pointers and atomic flags (`marked_for_deletion`, `fully_linked`), combined with retry loops. If a CAS fails due to concurrent modification, the operation retries, effectively finding a new linearization point. `find` operations return a value that was present at some point between its invocation and response, reflecting a linearizable view of the individual key.

   - range_query(low, high):
     This operation provides **snapshot isolation**. It uses an optimistic concurrency control mechanism involving a `global_version` counter. When a `range_query` starts, it records the `start_version`. It then traverses the skip list, collecting all active (not marked for deletion, fully linked) elements within the specified range. After collection, it records an `end_version`. If `start_version == end_version`, it means no write operations (insert/remove) occurred during the traversal, and thus the collected result represents a consistent snapshot of the skip list at a single point in time. If the versions differ, it implies a write occurred, and the `range_query` retries to obtain a fresh snapshot. This guarantees that the returned list does not include keys that were never simultaneously present during the operation's execution.

   - size():
     The `size()` operation provides an **approximate** count. It traverses the lowest level and counts active nodes. Since it does not use global synchronization, the count can change during traversal, meaning the returned value might not reflect the exact size at any single point in time. This is consistent with the prompt's requirement for an

Result

Winning Votes

0 / 3

Average Score

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer A is ambitious and attempts a lock-free C++ skip list with hazard pointers and version-based range queries, but it has serious correctness and compilability concerns. The implementation updates std::string values non-atomically, uses a simplistic hazard-pointer scheme that is not robust enough for this structure, and the insertion/removal protocol does not fully preserve skip-list invariants under contention. The range-query version check is intuitive but not sufficient given the surrounding write protocol. Testing is mostly a stress/demo and explicitly admits weak validation. The analysis is partially present but incomplete in the provided answer.

View Score Details ▼

Correctness

Weight 35%

The answer attempts non-blocking behavior, but several details undermine correctness: std::string value updates are not atomic, hazard-pointer protection is insufficient for the traversal and reclamation patterns used, insertion can partially link a node then delete it on later CAS failure, and the overall protocol does not convincingly ensure linearizable behavior under races. The range-query consistency argument also depends on a coarse version counter without fully safe read-side mechanics.

Completeness

Weight 20%

All required operations are present and there is an attempt at memory reclamation, testing, and analysis. However, the analysis is truncated in the provided answer, the testing explicitly states its validation is weak, and some required guarantees are discussed only partially rather than demonstrated convincingly.

Code Quality

Weight 20%

The code is heavily commented and organized, but the design is cluttered by a fragile custom hazard-pointer system, questionable goto-based retry handling, and several unsafe implementation shortcuts. Comments often acknowledge limitations that materially affect correctness, which reduces confidence in the code quality.

Practical Value

Weight 15%

In practice, this code would be risky to adopt because of likely race bugs, uncertain memory-reclamation safety, and weak validation. It is more of a conceptual sketch than a dependable implementation despite its length.

Instruction Following

Weight 10%

The answer follows the requested format by providing code, comments, a test, and discussion. However, it falls short on the requested level of robustness for correctness validation and does not fully satisfy the atomic-update and reliable snapshot expectations in implementation quality.

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer A provides a C++ implementation with hazard pointer memory reclamation, which is ambitious and shows understanding of lock-free concepts. However, it has several significant correctness issues: the insert function uses a goto-based retry that deletes the new node on CAS failure but only retries the outer loop (not the goto target), the hazard pointer usage in find_path is incomplete (only 2 HPs but 3 needed for safe traversal), the value update in insert is not atomic (direct string assignment), the range_query's optimistic version check is conceptually sound but the implementation has races (version can change between reads), and the validation test is very weak — it acknowledges it cannot properly validate concurrent correctness and mostly just checks for crashes. The analysis section is partially cut off. The concurrency strategy is described but has implementation gaps.

View Score Details ▼

Correctness

Weight 35%

Multiple correctness issues: non-atomic string value update in insert, the goto retry in insert deletes the new node but the retry logic is fragile, hazard pointer protection is incomplete in find_path (only 2 slots used but 3 needed), and the range_query version check has races. The basic structure is correct but concurrent edge cases are not properly handled.

Completeness

Weight 20%

All five operations are implemented. Hazard pointer reclamation is present. Analysis section exists but is cut off. The test exists but is weak. The implementation covers the required interface but has gaps in correctness.

Code Quality

Weight 20%

Code is well-commented and organized. However, the goto usage is poor style in C++, the hazard pointer implementation has correctness issues that undermine the quality, and the test code has many commented-out print statements suggesting incomplete development. The analysis is partially cut off.

Practical Value

Weight 15%

The C++ implementation with hazard pointers is ambitious but the correctness issues (non-atomic string update, incomplete HP protection) make it unsafe for production use. The test provides minimal confidence in correctness.

Instruction Following

Weight 10%

Follows most requirements: C++ implementation, all five operations, max level 32, geometric distribution p=0.5, 64-bit integer keys, string values, hazard pointer reclamation discussed. Analysis section is present but cut off. The test is weak.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer A provides a technically ambitious lock-free skip list implementation in C++, complete with a custom Hazard Pointer manager for memory reclamation. This demonstrates a deep understanding of low-level concurrency challenges in a non-GC language. However, the submission has several significant weaknesses. The `size()` method performs a full, slow scan of the list, which contradicts the prompt's request for an *approximate* count and is inefficient. The provided test suite is very basic, primarily checking for crashes and sortedness, and explicitly notes its own weakness in validating concurrent correctness. Finally, the analysis section omits the required discussion of the ABA problem. While the core algorithm is impressive, the lack of robust validation and minor deviations from the prompt's requirements detract from its overall quality.

View Score Details ▼

Correctness

Weight 35%

The core lock-free logic appears plausible, but there are two major issues. First, the `size()` implementation performs a full scan, which is not an 'approximate' count as requested and is very inefficient. Second, the testing is extremely weak and fails to provide confidence that the complex concurrent logic is actually correct. Without strong validation, the correctness of such an algorithm is highly questionable.

Completeness

Weight 20%

The submission provides all three required deliverables: code, a test, and an analysis. However, the analysis section is incomplete as it fails to discuss the ABA problem, which was an explicit requirement in the prompt.

Code Quality

Weight 20%

The C++ code is well-structured, with a good separation of the Hazard Pointer logic. Comments are helpful in explaining the concurrency strategy. The use of a `goto` statement for the retry loop, while sometimes seen in complex lock-free code, is a minor stylistic drawback.

Practical Value

Weight 15%

The implementation of a lock-free skip list with Hazard Pointers is a valuable educational example. However, its practical value is diminished by the simplified nature of the HP manager and, more importantly, the lack of rigorous testing, which makes it untrustworthy for production use.

Instruction Following

Weight 10%

The answer follows most instructions but falls short on a few key points. It implements `size()` in a way that doesn't align with the 'approximate' requirement, fails to include the ABA analysis, and provides a test that is much weaker than what the prompt implies is needed for a high-quality answer.

Answer B: OpenAI GPT-5.2

Show Full Answer ▼

// Concurrent Skip List with Range Queries (Snapshot) in Java
// ------------------------------------------------------
// Language choice: Java (GC simplifies memory reclamation vs C++/Rust).
// Concurrency strategy:
// - The skip list index structure uses lock-free linking with CAS on next pointers
// and logical deletion via mark bits (AtomicMarkableReference).
// - Updates to a node's value are atomic via AtomicReference<String>.
// - Physical removal is helped by traversals (classic Harris-Michael style).
// - Range queries provide a consistent snapshot by combining:
// (1) A global, lock-free "sequence lock" (seqlock) counter (AtomicLong).
// Writers increment it before and after structural changes (odd during write,
// even when quiescent).
// (2) Readers retry if they observe the counter changed or odd.
// This yields a snapshot that corresponds to some instant during the successful read.
// It avoids phantoms (keys that were never simultaneously present during the operation).
// - No single global mutex is used; seqlock is non-blocking (spinning only on contention).
//
// Operations:
// insert(key, value): insert if absent; else update value atomically.
// remove(key): logical delete then physical unlink.
// find(key): return value if present and not marked.
// range_query(low, high): snapshot-consistent result by seqlock retry.
// size(): approximate (LongAdder maintained on successful insert/remove).
//
// Notes:
// - The seqlock counter is a global atomic variable, but not a lock; writers do
// not block each other via mutex; they only coordinate by incrementing an atomic.
// This does serialize writers at the granularity of the sequence increments, but
// structural changes are still CAS-based.
// - In Java, ABA is mitigated because references are not reused as raw addresses
// in a way that breaks CAS in practice; additionally, we do not recycle nodes.
//
// Build & run:
// javac ConcurrentSkipListSnapshot.java && java ConcurrentSkipListSnapshot
//
// ------------------------------------------------------
// Analysis (brief):
// Linearizability / snapshot:
// - insert/remove/find are linearizable with respect to the skip list's abstract set.
// insert linearizes at successful CAS linking at level 0 for a new node, or at the
// AtomicReference set for update.
// remove linearizes at logical mark of next[0].
// - range_query provides a consistent snapshot: it returns a set of (key,value) pairs
// corresponding to some instant between its start and end, ensured by the seqlock
// check (even sequence unchanged). If concurrent writers occur, it retries.
// Complexity:
// - Expected O(log n) for find/insert/remove; range_query O(log n + k) where k results.
// - size() O(1) approximate.
// Limitations:
// - Under heavy write contention, range_query may starve due to retries.
// - Writers increment a global sequence twice per op; this is a scalability bottleneck
// compared to fully lock-free snapshot techniques.
// - Physical removal is best-effort; long chains of deleted nodes are eventually cleaned
// by helpers.
// ------------------------------------------------------

import java.util.;
import java.util.concurrent.;
import java.util.concurrent.atomic.*;

public class ConcurrentSkipListSnapshot {

static final int MAX_LEVEL = 32;
static final double P = 0.5;

static final class Node {
    final long key;
    final int topLevel;
    final AtomicReference<String> value; // null is allowed only for sentinels
    @SuppressWarnings("unchecked")
    final AtomicMarkableReference<Node>[] next = (AtomicMarkableReference<Node>[]) new AtomicMarkableReference[MAX_LEVEL];

    Node(long key, String value, int topLevel) {
        this.key = key;
        this.topLevel = topLevel;
        this.value = new AtomicReference<>(value);
        for (int i = 0; i < MAX_LEVEL; i++) {
            next[i] = new AtomicMarkableReference<>(null, false);
        }
    }

    boolean isMarked(int level) {
        boolean[] mark = new boolean[1];
        next[level].get(mark);
        return mark[0];
    }
}

static final class SkipList {
    private final Node head;
    private final Node tail;
    private final ThreadLocal<Random> rnd = ThreadLocal.withInitial(Random::new);

    // Snapshot seqlock: even = stable; odd = write in progress.
    private final AtomicLong seq = new AtomicLong(0);

    // Approximate size
    private final LongAdder size = new LongAdder();

    SkipList() {
        tail = new Node(Long.MAX_VALUE, null, MAX_LEVEL - 1);
        head = new Node(Long.MIN_VALUE, null, MAX_LEVEL - 1);
        for (int i = 0; i < MAX_LEVEL; i++) {
            head.next[i].set(tail, false);
        }
    }

    private int randomLevel() {
        int lvl = 0;
        // geometric distribution with p=0.5
        while (lvl < MAX_LEVEL - 1 && rnd.get().nextDouble() < P) {
            lvl++;
        }
        return lvl;
    }

    // Writer section begin/end. Not a lock; provides snapshot consistency for range_query.
    private void writerBegin() {
        seq.getAndIncrement(); // make odd
    }

    private void writerEnd() {
        seq.getAndIncrement(); // make even
    }

    // Find predecessors and successors; helps unlink marked nodes.
    // Returns true if key found (succs[0] key==target and not marked at level0).
    private boolean findSplice(long key, Node[] preds, Node[] succs) {
        boolean[] marked = {false};
        boolean snip;
        Node pred, curr, succ;

        retry:
        while (true) {
            pred = head;
            for (int level = MAX_LEVEL - 1; level >= 0; level--) {
                curr = pred.next[level].getReference();
                while (true) {
                    succ = curr.next[level].get(marked);
                    while (marked[0]) {
                        // curr is marked; try to unlink
                        snip = pred.next[level].compareAndSet(curr, succ, false, false);
                        if (!snip) {
                            continue retry;
                        }
                        curr = pred.next[level].getReference();
                        succ = curr.next[level].get(marked);
                    }
                    if (curr.key < key) {
                        pred = curr;
                        curr = succ;
                    } else {
                        break;
                    }
                }
                preds[level] = pred;
                succs[level] = curr;
            }
            return (succs[0].key == key);
        }
    }

    public boolean insert(long key, String val) {
        Objects.requireNonNull(val, "value");
        Node[] preds = new Node[MAX_LEVEL];
        Node[] succs = new Node[MAX_LEVEL];
        int topLevel = randomLevel();

        while (true) {
            boolean found = findSplice(key, preds, succs);
            if (found) {
                Node nodeFound = succs[0];
                // If node is logically deleted, help remove and retry
                if (nodeFound.isMarked(0)) {
                    continue;
                }
                // Update value atomically
                nodeFound.value.set(val);
                return false;
            }

            Node newNode = new Node(key, val, topLevel);
            for (int level = 0; level <= topLevel; level++) {
                Node succ = succs[level];
                newNode.next[level].set(succ, false);
            }

            // structural change: protect range snapshot with seqlock
            writerBegin();
            try {
                Node pred = preds[0];
                Node succ = succs[0];
                if (!pred.next[0].compareAndSet(succ, newNode, false, false)) {
                    continue; // retry
                }
                // link higher levels
                for (int level = 1; level <= topLevel; level++) {
                    while (true) {
                        pred = preds[level];
                        succ = succs[level];
                        if (pred.next[level].compareAndSet(succ, newNode, false, false)) {
                            break;
                        }
                        findSplice(key, preds, succs);
                    }
                }
                size.increment();
                return true;
            } finally {
                writerEnd();
            }
        }
    }

    public boolean remove(long key) {
        Node[] preds = new Node[MAX_LEVEL];
        Node[] succs = new Node[MAX_LEVEL];
        Node nodeToRemove;
        boolean[] marked = {false};

        while (true) {
            boolean found = findSplice(key, preds, succs);
            if (!found) return false;
            nodeToRemove = succs[0];

            // mark higher levels first
            for (int level = nodeToRemove.topLevel; level >= 1; level--) {
                Node succ = nodeToRemove.next[level].get(marked);
                while (!marked[0]) {
                    nodeToRemove.next[level].attemptMark(succ, true);
                    succ = nodeToRemove.next[level].get(marked);
                }
            }

            // now mark level 0 (linearization point)
            Node succ0 = nodeToRemove.next[0].get(marked);
            while (true) {
                if (marked[0]) {
                    // already removed
                    return false;
                }
                writerBegin();
                boolean iMarkedIt = false;
                try {
                    iMarkedIt = nodeToRemove.next[0].compareAndSet(succ0, succ0, false, true);
                } finally {
                    writerEnd();
                }
                if (iMarkedIt) {
                    // help physical removal
                    findSplice(key, preds, succs);
                    size.decrement();
                    return true;
                } else {
                    succ0 = nodeToRemove.next[0].get(marked);
                }
            }
        }
    }

    public String find(long key) {
        Node pred = head;
        boolean[] marked = {false};
        for (int level = MAX_LEVEL - 1; level >= 0; level--) {
            Node curr = pred.next[level].getReference();
            while (true) {
                Node succ = curr.next[level].get(marked);
                while (marked[0]) {
                    // help unlink
                    pred.next[level].compareAndSet(curr, succ, false, false);
                    curr = pred.next[level].getReference();
                    succ = curr.next[level].get(marked);
                }
                if (curr.key < key) {
                    pred = curr;
                    curr = succ;
                } else {
                    break;
                }
            }
        }
        Node curr = pred.next[0].getReference();
        if (curr.key != key) return null;
        if (curr.isMarked(0)) return null;
        return curr.value.get();
    }

    public List<Map.Entry<Long, String>> rangeQuery(long low, long high) {
        if (low > high) return Collections.emptyList();
        while (true) {
            long s1 = seq.get();
            if ((s1 & 1L) != 0L) {
                // writer in progress
                Thread.onSpinWait();
                continue;
            }

            ArrayList<Map.Entry<Long, String>> out = new ArrayList<>();

            // find first node >= low
            Node pred = head;
            boolean[] marked = {false};
            for (int level = MAX_LEVEL - 1; level >= 0; level--) {
                Node curr = pred.next[level].getReference();
                while (true) {
                    Node succ = curr.next[level].get(marked);
                    while (marked[0]) {
                        pred.next[level].compareAndSet(curr, succ, false, false);
                        curr = pred.next[level].getReference();
                        succ = curr.next[level].get(marked);
                    }
                    if (curr.key < low) {
                        pred = curr;
                        curr = succ;
                    } else {
                        break;
                    }
                }
            }

            Node curr = pred.next[0].getReference();
            while (curr.key <= high) {
                boolean[] m0 = {false};
                Node succ = curr.next[0].get(m0);
                if (!m0[0] && curr.key >= low && curr.key != Long.MAX_VALUE) {
                    String v = curr.value.get();
                    // include current value
                    out.add(new AbstractMap.SimpleImmutableEntry<>(curr.key, v));
                }
                curr = succ;
            }

            long s2 = seq.get();
            if (s1 == s2 && (s2 & 1L) == 0L) {
                // consistent snapshot
                return out;
            }
            // else retry
        }
    }

    public long size() {
        return size.sum();
    }

    // For testing: collect current keys (non-snapshot) under seqlock to compare.
    public List<Long> snapshotKeys() {
        List<Map.Entry<Long, String>> entries = rangeQuery(Long.MIN_VALUE, Long.MAX_VALUE);
        ArrayList<Long> keys = new ArrayList<>(entries.size());
        for (Map.Entry<Long, String> e : entries) keys.add(e.getKey());
        return keys;
    }
}

// ------------------------- Demo / Test -------------------------

public static void main(String[] args) throws Exception {
    final SkipList sl = new SkipList();

    final int THREADS = Math.max(4, Runtime.getRuntime().availableProcessors());
    final int OPS_PER_THREAD = 50_000;
    final long KEY_SPACE = 5_000;

    ExecutorService pool = Executors.newFixedThreadPool(THREADS);
    CyclicBarrier barrier = new CyclicBarrier(THREADS);

    // Reference map for validation using a striped lock (test harness only)
    final int STRIPES = 64;
    final ReentrantLock[] locks = new ReentrantLock[STRIPES];
    for (int i = 0; i < STRIPES; i++) locks[i] = new ReentrantLock();
    final HashMap<Long, String> ref = new HashMap<>();

    class Worker implements Callable<Void> {
        final int id;
        Worker(int id) { this.id = id; }

        int stripe(long k) { return (int)(Long.remainderUnsigned(k * 0x9E3779B97F4A7C15L, STRIPES)); }

        @Override public Void call() throws Exception {
            Random r = new Random(12345L + id * 99991L);
            barrier.await();

            for (int i = 0; i < OPS_PER_THREAD; i++) {
                long k = r.nextInt((int)KEY_SPACE);
                int op = r.nextInt(100);

                if (op < 40) {
                    // insert/update
                    String v = "v" + id + "_" + i;
                    boolean inserted = sl.insert(k, v);

                    // update ref
                    ReentrantLock lk = locks[stripe(k)];
                    lk.lock();
                    try {
                        boolean existed = ref.containsKey(k);
                        ref.put(k, v);
                        if (inserted == existed) {
                            // inserted should be true iff it did not exist
                            throw new AssertionError("insert return mismatch for key " + k);
                        }
                    } finally { lk.unlock(); }

                } else if (op < 60) {
                    // remove
                    boolean removed = sl.remove(k);
                    ReentrantLock lk = locks[stripe(k)];
                    lk.lock();
                    try {
                        boolean existed = ref.containsKey(k);
                        if (existed) ref.remove(k);
                        if (removed != existed) {
                            throw new AssertionError("remove return mismatch for key " + k);
                        }
                    } finally { lk.unlock(); }

                } else if (op < 85) {
                    // find
                    String got = sl.find(k);
                    ReentrantLock lk = locks[stripe(k)];
                    lk.lock();
                    try {
                        String exp = ref.get(k);
                        // find is linearizable; compare directly
                        if (!Objects.equals(got, exp)) {
                            throw new AssertionError("find mismatch key=" + k + " got=" + got + " exp=" + exp);
                        }
                    } finally { lk.unlock(); }

                } else {
                    // range query validation: compare to a consistent snapshot of ref
                    long a = r.nextInt((int)KEY_SPACE);
                    long b = r.nextInt((int)KEY_SPACE);
                    long low = Math.min(a, b);
                    long high = Math.max(a, b);

                    List<Map.Entry<Long, String>> got = sl.rangeQuery(low, high);

                    // Snapshot ref: lock all stripes in order
                    for (int s = 0; s < STRIPES; s++) locks[s].lock();
                    List<Map.Entry<Long, String>> exp = new ArrayList<>();
                    try {
                        for (Map.Entry<Long, String> e : ref.entrySet()) {
                            long kk = e.getKey();
                            if (kk >= low && kk <= high) {
                                exp.add(new AbstractMap.SimpleImmutableEntry<>(kk, e.getValue()));
                            }
                        }
                    } finally {
                        for (int s = STRIPES - 1; s >= 0; s--) locks[s].unlock();
                    }
                    exp.sort(Comparator.comparingLong(Map.Entry::getKey));

                    if (got.size() != exp.size()) {
                        throw new AssertionError("range size mismatch low=" + low + " high=" + high +
                                " got=" + got.size() + " exp=" + exp.size());
                    }
                    for (int idx = 0; idx < got.size(); idx++) {
                        Map.Entry<Long, String> ge = got.get(idx);
                        Map.Entry<Long, String> ee = exp.get(idx);
                        if (!ge.getKey().equals(ee.getKey()) || !Objects.equals(ge.getValue(), ee.getValue())) {
                            throw new AssertionError("range mismatch at " + idx + " got=" + ge + " exp=" + ee);
                        }
                    }
                }
            }
            return null;
        }
    }

    List<Future<Void>> futs = new ArrayList<>();
    for (int i = 0; i < THREADS; i++) futs.add(pool.submit(new Worker(i)));
    for (Future<Void> f : futs) f.get();
    pool.shutdown();
    pool.awaitTermination(30, TimeUnit.SECONDS);

    // Final validation: compare full snapshot
    List<Map.Entry<Long, String>> gotAll = sl.rangeQuery(Long.MIN_VALUE, Long.MAX_VALUE);

    // build expected
    ArrayList<Map.Entry<Long, String>> expAll = new ArrayList<>();
    for (int s = 0; s < STRIPES; s++) locks[s].lock();
    try {
        for (Map.Entry<Long, String> e : ref.entrySet()) {
            expAll.add(new AbstractMap.SimpleImmutableEntry<>(e.getKey(), e.getValue()));
        }
    } finally {
        for (int s = STRIPES - 1; s >= 0; s--) locks[s].unlock();
    }
    expAll.sort(Comparator.comparingLong(Map.Entry::getKey));

    if (gotAll.size() != expAll.size()) {
        throw new AssertionError("final size mismatch got=" + gotAll.size() + " exp=" + expAll.size());
    }
    for (int i = 0; i < gotAll.size(); i++) {
        Map.Entry<Long, String> g = gotAll.get(i);
        Map.Entry<Long, String> e = expAll.get(i);
        if (!g.getKey().equals(e.getKey()) || !Objects.equals(g.getValue(), e.getValue())) {
            throw new AssertionError("final mismatch at " + i + " got=" + g + " exp=" + e);
        }
    }

    System.out.println("OK. Threads=" + THREADS +
            " ops/thread=" + OPS_PER_THREAD +
            " finalSize=" + sl.size());
}

}

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer B provides a coherent Java implementation using CAS-based skip-list links, AtomicMarkableReference for logical deletion, AtomicReference for values, and a seqlock-style version counter for snapshot-consistent range queries. It implements all required operations, includes a meaningful multithreaded test harness, and offers a clear analysis of guarantees, complexity, and limitations. While the global sequence counter is a scalability bottleneck and the test harness makes some strong assumptions about direct comparison to a separately synchronized reference map, the solution is substantially more practical, internally consistent, and benchmark-ready than Answer A.

View Score Details ▼

Correctness

Weight 35%

The core skip-list operations are implemented in a recognizable Harris-Michael style with CAS and mark bits, and value updates are atomic. The seqlock-based retry gives a plausible snapshot mechanism for range queries. There are still concerns around the exact interaction between seqlock increments and structural updates, plus the test’s assumption that immediate comparison against a separate reference model proves linearizability, but the implementation is broadly correct and much stronger than A.

Completeness

Weight 20%

The answer covers all five operations, uses the requested probabilistic leveling with max level 32, includes concurrent tests with inserts/deletes/finds/range queries, and provides a concise analysis of guarantees, complexity, and limitations. It is close to complete for the benchmark, with only minor caveats around rigor of validation claims.

Code Quality

Weight 20%

The code is structured, readable, and uses appropriate Java concurrency primitives. Naming and comments are clear, the concurrency strategy is explained, and the implementation is self-contained. It is not minimal, but it is organized and practical. Some sections, especially the test harness and writer-sequence handling, could be more carefully justified.

Practical Value

Weight 15%

This is much closer to a usable concurrent skip-list implementation. Java GC removes major reclamation risk, atomic value handling is appropriate, and the included demo/test can actually help users exercise the structure. The main practical limitation is reduced scalability from the global sequence counter for snapshot reads and writes.

Instruction Following

Weight 10%

The answer follows the prompt well: it chooses an allowed language, implements the requested operations, avoids a single global mutex, explains the concurrency strategy, includes a meaningful multithreaded test, and discusses guarantees, complexity, and limitations. It adheres closely to the task requirements.

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer B provides a Java implementation using AtomicMarkableReference for lock-free logical deletion (Harris-Michael style), a seqlock for snapshot range queries, and LongAdder for approximate size. The implementation is substantially more correct: it properly handles logical deletion with mark bits at all levels, uses AtomicReference for atomic value updates, and the seqlock mechanism for range_query is well-implemented. Most importantly, the test harness is genuinely rigorous — it uses a striped-lock reference map and validates insert/remove/find return values and range query results against the reference, throwing AssertionErrors on mismatches. The analysis is concise and accurate. The main limitation is that the seqlock serializes writers somewhat, but this is acknowledged. The range_query validation in the test has a subtle race (the ref snapshot and the skip list snapshot are taken at different times), but the overall quality is much higher.

View Score Details ▼

Correctness

Weight 35%

Substantially correct: uses AtomicMarkableReference properly for logical deletion at all levels, AtomicReference for atomic value updates, and the seqlock for range snapshot is correctly implemented. The Harris-Michael style findSplice correctly helps unlink marked nodes. Minor issue: the test's range validation has a subtle race between the skip list snapshot and the ref snapshot, but the overall correctness is much stronger.

Completeness

Weight 20%

All five operations implemented. Memory safety handled by GC (acknowledged). Analysis section is complete and accurate. Test harness is thorough with reference model validation. The seqlock mechanism for snapshot isolation is well-explained. snapshotKeys helper is a nice addition.

Code Quality

Weight 20%

Clean Java code with good use of language features (AtomicMarkableReference, LongAdder, ThreadLocal). Comments explain the concurrency strategy clearly. The seqlock pattern is well-implemented. The test harness is well-structured with a striped lock reference model. Minor: the range validation race in the test is a quality issue.

Practical Value

Weight 15%

The Java implementation is practically usable. The seqlock approach for range queries is a known pattern. The test actually validates correctness. The main practical limitation is seqlock serialization under heavy write contention, which is acknowledged in the analysis.

Instruction Following

Weight 10%

Follows requirements well: Java implementation, all five operations, max level 32, geometric distribution p=0.5, 64-bit keys, string values, GC memory safety acknowledged. Analysis covers linearizability, complexity, and limitations. Test validates correctness. Minor: uses Long.MIN_VALUE/MAX_VALUE for sentinels which slightly differs from the 64-bit integer key requirement but is acceptable.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer B delivers an exceptionally high-quality, lock-free skip list in Java. It uses idiomatic and robust concurrency primitives like `AtomicMarkableReference` and `LongAdder`, the latter of which provides a perfect O(1) implementation for the approximate `size()` requirement. The standout feature of this submission is its comprehensive and rigorous test suite, which uses a striped-lock hash map as a reference model to validate the linearizability of point operations and the snapshot consistency of range queries. This provides a very high degree of confidence in the implementation's correctness. The code is clean, well-documented, and the analysis section is thorough, accurately discussing linearizability, complexity, limitations, and the ABA problem as requested. This is a complete, correct, and well-engineered solution.

View Score Details ▼

Correctness

Weight 35%

The implementation is demonstrably correct due to an excellent and rigorous testing strategy. It validates operations against a concurrent reference model, including a full snapshot comparison for range queries. All operations, including the O(1) approximate `size()` using `LongAdder`, are implemented correctly and efficiently, perfectly matching the prompt's requirements.

Completeness

Weight 20%

100

All deliverables are present and are of very high quality. The code is complete, the test is thorough, and the analysis section addresses all the required points, including linearizability, complexity, limitations, and the ABA problem.

Code Quality

Weight 20%

The Java code is very clean, well-organized, and uses idiomatic concurrency patterns like `AtomicMarkableReference`. The analysis and explanation at the top of the file are excellent for readability. The helper method `findSplice` is a good abstraction that improves clarity.

Practical Value

Weight 15%

This implementation is of high practical value. It is a robust, well-tested, and efficient concurrent data structure that could be confidently used in a real-world application. The thorough validation is key to its practical utility.

Instruction Following

Weight 10%

100

The answer follows all instructions meticulously. It implements all five operations as specified, uses the correct probabilistic distribution, provides a genuinely concurrent solution without a global lock, includes a strong concurrent test, and delivers a thorough analysis covering all requested topics.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Google Gemini 2.5 Flash

Winning Votes

0 / 3

Average Score

View this answer

Winner OpenAI GPT-5.2

Winning Votes

3 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models Google Gemini 2.5 Pro

Gemini 2.5 Flash 74

GPT-5.2 Winner 96

Why This Side Won

Answer B is the clear winner because it excels in correctness, validation, and adherence to the prompt's requirements, areas where Answer A is notably weaker. The primary deciding factor is the testing methodology; Answer B's rigorous validation against a concurrent reference model provides strong evidence of correctness, whereas Answer A's test is superficial. Furthermore, Answer B correctly implements an efficient, approximate `size()` function using `LongAdder`, while Answer A's implementation is an inefficient full scan. Finally, Answer B's analysis is more complete, addressing all points from the prompt, including the ABA problem, which Answer A missed. While Answer A's C++ implementation with Hazard Pointers is technically impressive, Answer B's submission is superior as a complete and verifiable solution.

Judge Models Anthropic Claude Sonnet 4.6

Gemini 2.5 Flash 52

GPT-5.2 Winner 77

Why This Side Won

Answer B wins on correctness, testing quality, and practical value. Its Java implementation correctly uses AtomicMarkableReference for lock-free deletion, atomic value updates, and a proper seqlock for snapshot range queries. The test harness actually validates correctness by comparing against a reference model and throwing on mismatches, rather than just checking for crashes. Answer A has multiple correctness bugs (non-atomic string update, incomplete hazard pointer protection, flawed goto-retry logic) and a test that explicitly admits it cannot validate concurrent correctness. Answer B's analysis is also more accurate and complete.

Judge Models OpenAI GPT-5.4

Gemini 2.5 Flash 47

GPT-5.2 Winner 82

Why This Side Won

Answer B wins because it is more complete, more likely to run correctly as written, and much better aligned with the prompt’s concurrency and testing requirements. Its Java design avoids manual reclamation pitfalls, uses atomic value updates correctly, and presents a clearer snapshot strategy for range queries. Answer A is conceptually interesting but is undermined by unsafe string updates, fragile memory-reclamation logic, questionable concurrent insertion/removal correctness, and weak validation.

Implement a Lock-Free Concurrent Skip List with Range Queries

Task Overview

Task Prompt

Answers

Answer A: Google Gemini 2.5 Flash

Answer B: OpenAI GPT-5.2

Comparison Summary

Judging Results

Related Tasks

Corporate Jargon Roast: A Satirical Office Memo

Design a URL Shortening Service

The Last Customer at a Closing Bookstore

Supporting a Sibling Who Feels Overshadowed at Family Gatherings

Helping a Friend Who Feels Stuck After a Career Change

Implement a Least Recently Used (LRU) Cache

Restructuring a Poorly Written Business Email

Explain Eventual Consistency to Junior Web Developers

Related Links