Implement a Dependency-Based Task Scheduler in Python

Compare model answers for this Coding benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

Coding

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

Google Gemini 2.5 Pro

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A Anthropic Claude Fable 5

Answer B OpenAI GPT-5.5

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.4 Anthropic Claude Opus 4.8 Google Gemini 2.5 Pro

Task Prompt

Write a Python function or class that schedules a list of tasks based on their dependencies. The scheduler should determine the order in which tasks can be executed, grouping tasks that can run in parallel.

The input will be a list of dictionaries, where each dictionary represents a task with the following keys:

id: A unique string identifier for the task.
name: A string name for the task.
dependencies: A list of string IDs of tasks that must be completed before this task can start.

Your implementatio...

Show more ▼

The input will be a list of dictionaries, where each dictionary represents a task with the following keys:

id: A unique string identifier for the task.
name: A string name for the task.
dependencies: A list of string IDs of tasks that must be completed before this task can start.

Your implementation should:

Take the list of task dictionaries as input.
Return a valid execution plan as a list of lists. Each inner list represents a 'batch' of tasks that can be executed concurrently. The order of batches represents the sequential execution order. The order of task IDs within a batch does not matter.
Detect and handle circular dependencies. If a cycle is found, it should raise a ValueError with a descriptive message.
Detect and handle cases where a dependency ID does not correspond to any existing task. This should also raise a ValueError.

Task Context

Here is an example of the input data structure:

tasks = [
    {'id': 'A', 'name': 'Data Ingestion', 'dependencies': []},
    {'id': 'B', 'name': 'Data Cleaning', 'dependencies': ['A']},
    {'id': 'C', 'name': 'Feature Engineering', 'dependencies': ['A']},
    {'id': 'D', 'name': 'Model Training', 'dependencies': ['B', 'C']},
    {'id': 'E', 'name': 'Setup Logging', 'dependencies': []},
    {'id': 'F', 'name': 'Model Deployment', 'dependencies': ['D']},
]

A valid output for this input would be:
`[['A'...

Show more ▼

Here is an example of the input data structure:

tasks = [
    {'id': 'A', 'name': 'Data Ingestion', 'dependencies': []},
    {'id': 'B', 'name': 'Data Cleaning', 'dependencies': ['A']},
    {'id': 'C', 'name': 'Feature Engineering', 'dependencies': ['A']},
    {'id': 'D', 'name': 'Model Training', 'dependencies': ['B', 'C']},
    {'id': 'E', 'name': 'Setup Logging', 'dependencies': []},
    {'id': 'F', 'name': 'Model Deployment', 'dependencies': ['D']},
]

A valid output for this input would be:
[['A', 'E'], ['B', 'C'], ['D'], ['F']]

Another example with a different structure:

tasks = [
    {'id': 'checkout', 'dependencies': []},
    {'id': 'build', 'dependencies': ['checkout']},
    {'id': 'unit_test', 'dependencies': ['build']},
    {'id': 'integration_test', 'dependencies': ['build']},
    {'id': 'deploy', 'dependencies': ['unit_test', 'integration_test']},
]

A valid output for this input would be:
[['checkout'], ['build'], ['unit_test', 'integration_test'], ['deploy']]

Judging Policy

A high-quality solution must be functionally correct and robust.

Correctness: The generated schedule must accurately reflect the task dependencies. All tasks must be included in the output exactly once, and no task should be scheduled before its dependencies are met.
Error Handling: The solution must correctly identify and raise a ValueError for both circular dependencies and references to non-existent dependency tasks. The error messages should be clear and informative.
Algorithm Choice: The im...

Show more ▼

A high-quality solution must be functionally correct and robust.

Correctness: The generated schedule must accurately reflect the task dependencies. All tasks must be included in the output exactly once, and no task should be scheduled before its dependencies are met.
Error Handling: The solution must correctly identify and raise a ValueError for both circular dependencies and references to non-existent dependency tasks. The error messages should be clear and informative.
Algorithm Choice: The implementation should use an efficient algorithm for topological sorting, such as Kahn's algorithm or a DFS-based approach. Inefficient or brute-force solutions will be judged poorly.
Code Quality: The code should be clean, well-structured, and readable. It should adhere to standard Python conventions (PEP 8). The use of appropriate data structures (e.g., dictionaries for graph representation, sets for efficient lookups) is expected.
Output Format: The output must strictly match the specified format: a list of lists of strings for a valid schedule, or a raised ValueError for an invalid task graph.

Answers

Answer A: Anthropic Claude Fable 5

"""
Dependency-based task scheduler.

Produces an execution plan as a list of batches (lists of task IDs),
where every task in a batch can run concurrently, and batches must run
sequentially. Uses Kahn's algorithm (level-by-level topological sort).

Raises ValueError on:

duplicate task IDs,
dependencies that reference unknown task IDs,
circular dependencies (with the offending tasks listed).
"""

from collections import deque
from typing import Dict, List

class TaskScheduler:
"""Schedules tasks into parallelizable batches based on dependencies."""

def __init__(self, tasks: List[dict]):...

Show Full Answer ▼

"""
Dependency-based task scheduler.

Raises ValueError on:

duplicate task IDs,
dependencies that reference unknown task IDs,
circular dependencies (with the offending tasks listed).
"""

from collections import deque
from typing import Dict, List

class TaskScheduler:
"""Schedules tasks into parallelizable batches based on dependencies."""

def __init__(self, tasks: List[dict]):
    self.tasks = tasks
    self._validate_and_build_graph()

def _validate_and_build_graph(self) -> None:
    # Collect all task IDs and check for duplicates.
    self.task_ids = set()
    for task in self.tasks:
        if 'id' not in task:
            raise ValueError(f"Task is missing required 'id' key: {task!r}")
        task_id = task['id']
        if task_id in self.task_ids:
            raise ValueError(f"Duplicate task ID found: '{task_id}'")
        self.task_ids.add(task_id)

    # Build adjacency list (dependency -> dependents) and in-degree map.
    self.dependents: Dict[str, List[str]] = {tid: [] for tid in self.task_ids}
    self.in_degree: Dict[str, int] = {tid: 0 for tid in self.task_ids}

    for task in self.tasks:
        task_id = task['id']
        for dep in task.get('dependencies') or []:
            if dep not in self.task_ids:
                raise ValueError(
                    f"Task '{task_id}' depends on unknown task ID '{dep}'"
                )
            if dep == task_id:
                raise ValueError(
                    f"Task '{task_id}' depends on itself (circular dependency)"
                )
            self.dependents[dep].append(task_id)
            self.in_degree[task_id] += 1

def schedule(self) -> List[List[str]]:
    """
    Return the execution plan as a list of batches.

    Each batch is a list of task IDs that can be executed concurrently.
    Batches must be executed in the returned order.
    """
    in_degree = dict(self.in_degree)  # work on a copy
    current_batch = deque(
        tid for tid in in_degree if in_degree[tid] == 0
    )

    plan: List[List[str]] = []
    scheduled_count = 0

    while current_batch:
        batch = sorted(current_batch)  # deterministic output (order is arbitrary)
        plan.append(batch)
        scheduled_count += len(batch)

        next_batch = deque()
        for task_id in batch:
            for dependent in self.dependents[task_id]:
                in_degree[dependent] -= 1
                if in_degree[dependent] == 0:
                    next_batch.append(dependent)
        current_batch = next_batch

    if scheduled_count != len(self.task_ids):
        unresolved = sorted(
            tid for tid in in_degree if in_degree[tid] > 0
        )
        raise ValueError(
            "Circular dependency detected involving tasks: "
            + ", ".join(f"'{t}'" for t in unresolved)
        )

    return plan

def schedule_tasks(tasks: List[dict]) -> List[List[str]]:
"""
Convenience function: schedule a list of task dictionaries.

Args:
    tasks: List of dicts, each with keys 'id' (str), optionally 'name'
           (str), and 'dependencies' (list of task ID strings).

Returns:
    A list of lists of task IDs. Each inner list is a batch of tasks
    that can run in parallel; batches run sequentially in order.

Raises:
    ValueError: If a dependency refers to a non-existent task ID, if a
                circular dependency exists, or if task IDs are duplicated.
"""
return TaskScheduler(tasks).schedule()

if name == 'main':
# Example 1
tasks = [
{'id': 'A', 'name': 'Data Ingestion', 'dependencies': []},
{'id': 'B', 'name': 'Data Cleaning', 'dependencies': ['A']},
{'id': 'C', 'name': 'Feature Engineering', 'dependencies': ['A']},
{'id': 'D', 'name': 'Model Training', 'dependencies': ['B', 'C']},
{'id': 'E', 'name': 'Setup Logging', 'dependencies': []},
{'id': 'F', 'name': 'Model Deployment', 'dependencies': ['D']},
]
print(schedule_tasks(tasks))
# -> [['A', 'E'], ['B', 'C'], ['D'], ['F']]

# Example 2
pipeline = [
    {'id': 'checkout', 'dependencies': []},
    {'id': 'build', 'dependencies': ['checkout']},
    {'id': 'unit_test', 'dependencies': ['build']},
    {'id': 'integration_test', 'dependencies': ['build']},
    {'id': 'deploy', 'dependencies': ['unit_test', 'integration_test']},
]
print(schedule_tasks(pipeline))
# -> [['checkout'], ['build'], ['integration_test', 'unit_test'], ['deploy']]

# Example 3: circular dependency raises ValueError
cyclic = [
    {'id': 'X', 'dependencies': ['Y']},
    {'id': 'Y', 'dependencies': ['X']},
]
try:
    schedule_tasks(cyclic)
except ValueError as e:
    print(f"ValueError: {e}")

# Example 4: unknown dependency raises ValueError
broken = [
    {'id': 'A', 'dependencies': ['missing_task']},
]
try:
    schedule_tasks(broken)
except ValueError as e:
    print(f"ValueError: {e}")

Result

#1 | Winner

Winning Votes

2 / 3

Average Score

Judge Models Anthropic Claude Opus 4.8

Total Score

Overall Comments

Answer A is functionally correct, using Kahn's algorithm with a clean level-by-level (batch) topological sort. It validates duplicate IDs, unknown dependencies, and self-loops, and detects cycles via the scheduled-count check, raising descriptive ValueErrors that even list the offending tasks. Notable strengths: it builds the graph at construction time (so unknown-dependency errors surface eagerly), produces deterministic output via sorting, includes thorough docstrings, type hints, and a comprehensive set of runnable examples covering both valid and invalid cases (cycle and unknown dependency). It also offers both a class API and a convenience function, matching the prompt's flexibility. Minor weaknesses: sorting each batch is a small (and unrequested) overhead, and the eager self-dependency check is slightly redundant given the cycle detector, but neither is harmful.

View Score Details ▼

Correctness

Weight 35%

Correctly implements level-by-level Kahn's algorithm; valid schedules match the expected outputs, cycles and unknown dependencies both raise ValueError, and the cycle check via scheduled_count is sound. Sorted batches guarantee correct, deterministic results.

Completeness

Weight 20%

Handles duplicates, unknown deps, self-loops, and cycles; provides both class and function APIs; and includes four runnable examples covering valid schedules plus both error types, demonstrating full coverage.

Code Quality

Weight 20%

Clean class structure with clear separation of validation and scheduling, good docstrings, type hints, and informative error messages. Minor redundancy in the explicit self-dependency check.

Practical Value

Weight 15%

Deterministic output and eager validation at construction make it dependable and easy to integrate; runnable examples for all paths aid practical adoption.

Instruction Following

Weight 10%

Matches required output format, raises ValueError for both required error cases with descriptive messages, and provides a function as requested; fully aligns with the prompt.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer A is a strong, mostly correct implementation using level-by-level Kahn's algorithm, and it cleanly returns parallelizable batches. It handles unknown dependencies, duplicate IDs, and cycles with clear errors, and the code is readable and well structured. Its main weakness is that it performs less input validation than Answer B and is slightly less robust for malformed inputs beyond the core prompt requirements.

View Score Details ▼

Correctness

Weight 35%

Implements batch-wise topological sorting correctly and detects unknown dependencies and cycles. It also catches self-dependency explicitly. A minor limitation is that malformed dependency field types are not validated and could lead to unintended behavior rather than a clear contract-preserving error.

Completeness

Weight 20%

Covers all required behaviors and even adds duplicate-ID handling and examples. However, it is less complete around malformed input validation such as non-list dependencies or non-string IDs beyond missing 'id'.

Code Quality

Weight 20%

Well organized with a clear class design, helper method, type hints, docstrings, and deterministic sorted batches. The structure is easy to follow and maintain.

Practical Value

Weight 15%

Useful in practice, especially with deterministic batch ordering and descriptive errors. It is somewhat less defensive against malformed inputs, which lowers robustness in production contexts.

Instruction Following

Weight 10%

Follows the prompt closely: accepts task dictionaries, returns list-of-lists batches, and raises ValueError for cycles and missing dependencies. The implementation and examples align well with the required answer type.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer A provides an excellent, professional-quality solution. It uses a well-structured class-based design that cleanly separates graph building and validation from the scheduling logic. The implementation of Kahn's algorithm is correct and efficient. Error handling is robust, covering all specified cases plus additional ones like duplicate task IDs. The code is clean, well-documented with type hints, and includes a comprehensive if __name__ == '__main__' block that serves as a set of mini-tests, demonstrating both successful execution and error handling.

View Score Details ▼

Correctness

Weight 35%

100

The implementation of Kahn's algorithm is flawless. It correctly identifies batches of parallelizable tasks and respects all dependencies, producing the correct output for the given examples.

Completeness

Weight 20%

The solution correctly handles all specified requirements, including detection of circular dependencies and non-existent dependencies. It also adds useful checks for duplicate task IDs and self-dependencies.

Code Quality

Weight 20%

The code quality is excellent. The class-based design provides great structure and separation of concerns. The code is clean, readable, well-documented, and uses type hints effectively. The `if __name__ == '__main__'` block is comprehensive and demonstrates the code's full functionality.

Practical Value

Weight 15%

The solution is highly practical. The class-based design makes it easy to reuse and extend. An instance of the scheduler could be created and its graph inspected or reused, which is a common requirement in real-world applications.

Instruction Following

Weight 10%

100

The answer perfectly follows all instructions, including the input/output format, the choice of a function or class, and the specific error handling requirements.

Answer B: OpenAI GPT-5.5

from collections import deque
from typing import Dict, Iterable, List, Set, Any

def schedule_tasks(tasks: List[Dict[str, Any]]) -> List[List[str]]:
"""
Build a dependency-based execution plan for a list of tasks.

Each returned inner list is a batch of task IDs that can be run concurrently.
Batches must be executed in the order returned.

Args:
    tasks: A list of dictionaries. Each task must contain:
        - 'id': unique string task identifier
        - 'dependencies': list of task IDs that must complete first

Returns:
    A list of batches, where each batch is a lis...

Show Full Answer ▼

from collections import deque
from typing import Dict, Iterable, List, Set, Any

def schedule_tasks(tasks: List[Dict[str, Any]]) -> List[List[str]]:
"""
Build a dependency-based execution plan for a list of tasks.

Each returned inner list is a batch of task IDs that can be run concurrently.
Batches must be executed in the order returned.

Args:
    tasks: A list of dictionaries. Each task must contain:
        - 'id': unique string task identifier
        - 'dependencies': list of task IDs that must complete first

Returns:
    A list of batches, where each batch is a list of task IDs.

Raises:
    ValueError: If task IDs are duplicated, dependencies are malformed,
                a dependency references an unknown task, or a cycle exists.
"""
if not isinstance(tasks, list):
    raise ValueError("tasks must be a list of task dictionaries")

task_ids: Set[str] = set()

for task in tasks:
    if not isinstance(task, dict):
        raise ValueError("each task must be a dictionary")

    task_id = task.get("id")
    if not isinstance(task_id, str) or not task_id:
        raise ValueError("each task must have a non-empty string 'id'")

    if task_id in task_ids:
        raise ValueError(f"duplicate task id found: {task_id!r}")

    dependencies = task.get("dependencies", [])
    if not isinstance(dependencies, list):
        raise ValueError(f"dependencies for task {task_id!r} must be a list")

    for dependency_id in dependencies:
        if not isinstance(dependency_id, str) or not dependency_id:
            raise ValueError(
                f"dependencies for task {task_id!r} must contain only non-empty strings"
            )

    task_ids.add(task_id)

# adjacency maps each task to the tasks that become closer to runnable after it completes.
adjacency: Dict[str, List[str]] = {task_id: [] for task_id in task_ids}

# in_degree counts how many unfinished dependencies each task has.
in_degree: Dict[str, int] = {task_id: 0 for task_id in task_ids}

for task in tasks:
    task_id = task["id"]
    dependencies = task.get("dependencies", [])

    # Treat duplicate dependency IDs on a single task as an invalid input because they
    # otherwise create ambiguous bookkeeping and usually indicate a data error.
    seen_dependencies: Set[str] = set()

    for dependency_id in dependencies:
        if dependency_id not in task_ids:
            raise ValueError(
                f"task {task_id!r} depends on unknown task id {dependency_id!r}"
            )

        if dependency_id in seen_dependencies:
            raise ValueError(
                f"task {task_id!r} lists dependency {dependency_id!r} more than once"
            )
        seen_dependencies.add(dependency_id)

        adjacency[dependency_id].append(task_id)
        in_degree[task_id] += 1

ready = deque(task_id for task_id in task_ids if in_degree[task_id] == 0)
execution_plan: List[List[str]] = []
scheduled_count = 0

while ready:
    batch = list(ready)
    ready.clear()
    execution_plan.append(batch)
    scheduled_count += len(batch)

    for completed_task_id in batch:
        for dependent_task_id in adjacency[completed_task_id]:
            in_degree[dependent_task_id] -= 1
            if in_degree[dependent_task_id] == 0:
                ready.append(dependent_task_id)

if scheduled_count != len(task_ids):
    cyclic_task_ids = sorted(
        task_id for task_id, remaining_dependencies in in_degree.items()
        if remaining_dependencies > 0
    )
    raise ValueError(
        "circular dependency detected involving task(s): "
        + ", ".join(cyclic_task_ids)
    )

return execution_plan

class TaskScheduler:
"""Small convenience wrapper around schedule_tasks()."""

def __init__(self, tasks: List[Dict[str, Any]]):
    self.tasks = tasks

def execution_plan(self) -> List[List[str]]:
    return schedule_tasks(self.tasks)

if name == "main":
tasks = [
{"id": "A", "name": "Data Ingestion", "dependencies": []},
{"id": "B", "name": "Data Cleaning", "dependencies": ["A"]},
{"id": "C", "name": "Feature Engineering", "dependencies": ["A"]},
{"id": "D", "name": "Model Training", "dependencies": ["B", "C"]},
{"id": "E", "name": "Setup Logging", "dependencies": []},
{"id": "F", "name": "Model Deployment", "dependencies": ["D"]},
]

print(schedule_tasks(tasks))
# Example output: [['A', 'E'], ['B', 'C'], ['D'], ['F']]
# The order within each batch may differ because tasks in a batch are concurrent.

Result

Winning Votes

1 / 3

Average Score

Judge Models Anthropic Claude Opus 4.8

Total Score

Overall Comments

Answer B is also functionally correct, using Kahn's algorithm with batch grouping. It has the most thorough input validation: it checks that tasks is a list, each item is a dict, each id is a non-empty string, dependencies is a list, and that dependency entries are non-empty strings. It also rejects duplicate dependency IDs on a single task. Cycle detection and unknown-dependency detection are correct with clear messages. Weaknesses: the unknown-dependency check happens in the second pass rather than at validation, which is fine but slightly less tidy; output ordering within batches is non-deterministic (depends on set iteration), which is acceptable per the spec but less reproducible than A. Its example coverage is thinner—only one valid example, with no demonstration of the error paths. The TaskScheduler wrapper is a thin pass-through.

View Score Details ▼

Correctness

Weight 35%

Correctly implements Kahn's algorithm with proper batch grouping, cycle detection, and unknown-dependency detection. Output is correct but batch ordering depends on set iteration, making it non-deterministic though still spec-valid.

Completeness

Weight 20%

Handles duplicates, malformed inputs, unknown deps, duplicate per-task deps, and cycles—arguably broader validation. However it demonstrates only one valid example and shows no error-path examples, leaving behavior less illustrated.

Code Quality

Weight 20%

Very readable single-function design with descriptive variable names, thorough docstrings, and inline rationale comments. The TaskScheduler wrapper adds little, but overall structure is clean and idiomatic.

Practical Value

Weight 15%

Robust defensive validation makes it safe against malformed inputs, but non-deterministic batch ordering and fewer demonstrations slightly reduce practical reproducibility.

Instruction Following

Weight 10%

Matches required output format and raises descriptive ValueErrors for both required error cases via a function as requested; fully compliant though with thinner demonstration of requirements.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer B is a robust and correct implementation of Kahn's algorithm that returns concurrent execution batches and raises clear ValueError exceptions for invalid graphs. It adds stronger validation for task shape and dependency types, keeps the code clean and practical, and includes a convenient wrapper class. Its only notable minor drawback is that batch ordering is not deterministic because it iterates from a set-backed collection.

View Score Details ▼

Correctness

Weight 35%

Correctly computes parallel batches with Kahn's algorithm and detects unknown dependencies and cycles. It also validates IDs and dependency entries more thoroughly, reducing failure modes. Minor deduction because batch order is nondeterministic due to set iteration, though order within a batch is allowed to vary.

Completeness

Weight 20%

Fully addresses the requested scheduling behavior, output format, cycle detection, and missing dependency errors, while also covering duplicate IDs, malformed tasks, malformed dependency lists, and duplicate dependencies. This makes the solution more complete in real-world usage.

Code Quality

Weight 20%

Clean, readable, and idiomatic code with good naming, type hints, and focused logic. Slightly less polished than A because iteration order is left nondeterministic and the added wrapper class is thinner in design.

Practical Value

Weight 15%

Very practical due to stronger input validation, clear exceptions, and direct function usage with an optional wrapper class. It is well suited to real inputs that may be messy or inconsistent.

Instruction Following

Weight 10%

Follows the prompt very closely and returns the required structure while using an efficient algorithm and appropriate error handling. It also stays focused on code as requested and aligns well with the specification.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer B provides a functionally correct solution that also uses Kahn's algorithm. Its main strength is its extremely thorough and paranoid input validation, which checks for a wide range of potential data errors beyond the prompt's requirements. However, its code structure is weaker than Answer A's. The core logic is contained in a single, long function, which is less modular and readable. The accompanying TaskScheduler class is a thin wrapper that adds little value. The example usage in the main block is also less comprehensive, failing to demonstrate the robust error handling implemented in the code.

View Score Details ▼

Correctness

Weight 35%

100

The implementation of Kahn's algorithm is flawless. It correctly identifies batches of parallelizable tasks and respects all dependencies, producing the correct output for the given examples.

Completeness

Weight 20%

The solution correctly handles all specified requirements. Its input validation is exceptionally thorough, checking not only for the required errors but also for data types, empty strings, and duplicate dependencies within a single task's list, making it slightly more complete in its robustness.

Code Quality

Weight 20%

The code is functionally correct but less well-structured. The logic is packed into a single large function, which harms readability and maintainability. The `TaskScheduler` class feels like an afterthought and adds no real value. The testing block is minimal and doesn't showcase the extensive error handling.

Practical Value

Weight 15%

The solution is practical due to its correctness and robust validation. However, the monolithic function design makes it less flexible and reusable in a larger software system compared to a well-designed class.

Instruction Following

Weight 10%

100

The answer perfectly follows all instructions, including the input/output format, the choice of a function or class, and the specific error handling requirements.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Winner Anthropic Claude Fable 5

Winning Votes

2 / 3

Average Score

View this answer

OpenAI GPT-5.5

Winning Votes

1 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models Google Gemini 2.5 Pro

Claude Fable 5 Winner 96

GPT-5.5 91

Why This Side Won

Answer A is the winner due to its superior code quality and software design. The use of a dedicated class to encapsulate the state and logic of the scheduler is a much cleaner, more maintainable, and more reusable approach than Answer B's monolithic function. While Answer B features slightly more exhaustive input validation, Answer A's overall structure is significantly better and more aligned with object-oriented principles. Furthermore, Answer A's inclusion of a comprehensive testing block that demonstrates its error-handling capabilities makes it a more complete and professional submission.

Judge Models OpenAI GPT-5.4

Claude Fable 5 88

GPT-5.5 Winner 91

Why This Side Won

Answer B wins because both answers are functionally strong, but B scores higher on the more heavily weighted correctness and completeness dimensions by providing broader validation and similarly solid cycle and missing-dependency handling. Although Answer A has slightly more deterministic output within batches, Answer B's stronger robustness and practical input checks give it the higher weighted overall result.

Judge Models Anthropic Claude Opus 4.8

Claude Fable 5 Winner 88

GPT-5.5 86

Why This Side Won

Both answers are correct and use Kahn's algorithm appropriately, so correctness is near-tied. A wins narrowly on the heavily weighted correctness/completeness/code-quality cluster due to richer demonstration of all required behaviors (valid schedule, cycle ValueError, and unknown-dependency ValueError are all shown executing), deterministic batch output that makes results reproducible, and eager graph validation at construction. B's stronger defensive input validation is valuable but addresses cases beyond the spec, while A more thoroughly exercises and documents the exact required scenarios, giving it a slight edge on practical value and completeness.

Implement a Dependency-Based Task Scheduler in Python

Task Overview

Task Prompt

Answers

Answer A: Anthropic Claude Fable 5

Answer B: OpenAI GPT-5.5

Comparison Summary

Judging Results

Related Tasks

Brainstorming Sustainable Urban Farming Initiatives

Internal Memo: Announcing New Hybrid Work Policy

Plan a Community Garden Party

System Design: Real-Time Notification Service

Empathetic Response to a Struggling Colleague

Eulogy for a Sentient Toaster

Modernizing the Public Library for Young Adults

Persuasive Memo for a Four-Day Work Week

Related Links