Design a Real-Time Collaborative Whiteboard System

Compare model answers for this System Design benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

System Design

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

Google Gemini 2.5 Pro

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A Anthropic Claude Opus 4.8

Answer B OpenAI GPT-5.4

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.5 Anthropic Claude Sonnet 4.6 Google Gemini 2.5 Flash

Task Prompt

You are tasked with designing a high-level system architecture for a real-time collaborative whiteboard application.

Core Requirements:

Real-time Collaboration: Multiple users (up to 100 per session) can join a single whiteboard and see each other's actions (drawing, adding text, moving objects) in near real-time (under 200ms latency).
Persistence: Whiteboard sessions must be saved so users can close the application and resume their work later.
Tools: Users should have basic tools like...

Show more ▼

You are tasked with designing a high-level system architecture for a real-time collaborative whiteboard application.

Core Requirements:

Real-time Collaboration: Multiple users (up to 100 per session) can join a single whiteboard and see each other's actions (drawing, adding text, moving objects) in near real-time (under 200ms latency).
Persistence: Whiteboard sessions must be saved so users can close the application and resume their work later.
Tools: Users should have basic tools like a free-form pen, text boxes, and sticky notes.

Scale and Reliability Constraints:

Support up to 10,000 concurrent active whiteboard sessions.
Support up to 1,000,000 total users.
The service must be highly available, with 99.9% uptime.

Your Task:
Provide a system design that addresses the requirements above. Your response should cover:

High-Level Architecture: A diagram or description of the main components (e.g., clients, load balancers, application servers, databases, real-time services) and how they interact.
Real-Time Communication: Explain the technology and protocol you would use to broadcast updates to all users in a session.
Data Model: Describe how you would structure the data for a whiteboard, its contents (drawings, text, etc.), and user sessions.
Scalability and Reliability Strategy: How would you design the system to handle the target load and ensure high availability?
Trade-offs: Discuss one major trade-off you made in your design (e.g., consistency vs. latency, choice of database, etc.).

Task Context

This is a classic system design problem that tests understanding of real-time communication, state management, and distributed systems. It is similar to designing applications like Miro, FigJam, or Google Docs.

Judging Policy

A high-quality answer will propose a coherent and logical system architecture. The evaluation should focus on:

Architecture Viability: The proposed architecture should be practical and use appropriate components for the task. Key services like API gateways, application servers, real-time messaging systems (e.g., using WebSockets), and databases should be included and their roles explained.
Real-Time Solution: The choice of technology for real-time communication (e.g., WebSockets, WebRTC) must be well-ju...

Show more ▼

A high-quality answer will propose a coherent and logical system architecture. The evaluation should focus on:

Architecture Viability: The proposed architecture should be practical and use appropriate components for the task. Key services like API gateways, application servers, real-time messaging systems (e.g., using WebSockets), and databases should be included and their roles explained.
Real-Time Solution: The choice of technology for real-time communication (e.g., WebSockets, WebRTC) must be well-justified. The explanation of how updates are propagated should be clear.
Data Modeling: The data model should be sensible for storing various types of whiteboard objects and their properties (position, color, content). It should be efficient for both writing updates and loading a full board.
Scalability and Reliability: The answer must include concrete strategies for scaling (e.g., horizontal scaling of servers, database replication/sharding) and reliability (e.g., redundancy, failover mechanisms).
Trade-off Analysis: The discussion of trade-offs should be insightful, demonstrating an understanding of the design's implications (e.g., choosing a NoSQL database for flexibility and scalability at the cost of transactional consistency).

Answers

Answer A: Anthropic Claude Opus 4.8

Real-Time Collaborative Whiteboard — System Design

1. High-Level Architecture

Components and request/data flow:

                       ┌──────────────┐
   Web/Mobile/Desktop  │   Clients    │  (Canvas rendering, local CRDT replica,
        Clients ─────► │  (100/board) │   WebSocket client, offline buffer)
                       └──────┬───────┘
                              │ HTTPS (REST) + WSS (WebSocket)
                       ┌──────▼───────┐
                       │     CDN      │  (static assets, exported images)
                       └──────┬───────┘
                       ┌──────▼────────────...

Show Full Answer ▼

Real-Time Collaborative Whiteboard — System Design

1. High-Level Architecture

Components and request/data flow:

                       ┌──────────────┐
   Web/Mobile/Desktop  │   Clients    │  (Canvas rendering, local CRDT replica,
        Clients ─────► │  (100/board) │   WebSocket client, offline buffer)
                       └──────┬───────┘
                              │ HTTPS (REST) + WSS (WebSocket)
                       ┌──────▼───────┐
                       │     CDN      │  (static assets, exported images)
                       └──────┬───────┘
                       ┌──────▼───────────────┐
                       │  Global Load Balancer │ (L7, TLS termination,
                       │   + API Gateway       │  auth, rate limiting)
                       └───┬───────────────┬───┘
            REST traffic   │               │  WS upgrade (sticky by sessionId)
            ┌──────────────▼──┐      ┌─────▼───────────────┐
            │  Stateless App   │      │  Realtime Collab     │
            │  Services        │      │  Servers (WS)        │
            │ (auth, board     │      │  - hold in-memory    │
            │  CRUD, sharing,  │      │    board state       │
            │  exports)        │      │  - merge ops (CRDT)  │
            └───┬────────┬─────┘      │  - broadcast deltas  │
                │        │            └───┬───────────┬──────┘
        ┌───────▼──┐ ┌───▼─────┐     ┌────▼────┐  ┌───▼────────┐
        │ Metadata │ │ Object/ │     │ Redis   │  │ Session    │
        │   DB     │ │ Blob    │     │ Pub/Sub │  │ Routing    │
        │(Postgres)│ │ Store(S3)│    │+Presence│  │(Consistent │
        └──────────┘ └─────────┘     └─────────┘  │  hashing)  │
                │                                  └────────────┘
        ┌───────▼─────────────┐   ┌─────────────────────────────┐
        │ Document/Op Store    │   │ Async Workers (Kafka queue) │
        │ (DynamoDB/Cassandra: │◄──│ - snapshotting              │
        │  ops log + snapshots) │   │ - thumbnail/export gen      │
        └─────────────────────┘    │ - analytics                 │
                                    └─────────────────────────────┘

Interaction summary: Clients authenticate via the API Gateway (REST), then open a persistent WebSocket to a Realtime Collab server. The gateway uses consistent hashing on sessionId so that all participants of one board land on the same server (or a small replica set), keeping the authoritative live state in one place. App Services handle non-real-time CRUD (creating boards, sharing, listing, exports). Redis Pub/Sub bridges Realtime servers so that if participants are split across instances, ops still propagate. Async workers periodically persist snapshots and the op log to durable storage.

2. Real-Time Communication

Protocol: WebSocket (WSS) for full-duplex, low-latency bidirectional messaging. Falls back to HTTP long-polling via a library like Socket.IO for restrictive networks. WebRTC data channels are considered for cursor/presence peer-to-peer, but a server-relayed model is chosen for simplicity and reliability.
Message model: Clients send small operations/deltas (e.g., {type:'stroke_add', objId, points, color}, {type:'obj_move', objId, dx, dy}) rather than full board state. The server validates, assigns a sequence/version, merges, and broadcasts the delta to all other session members.
Fan-out: Each Realtime server keeps the connection set per board in memory and broadcasts deltas directly. For boards whose members span multiple servers, the originating server publishes the op to a Redis Pub/Sub channel keyed by sessionId; subscribed servers re-broadcast to their local connections.
Presence & cursors: High-frequency, low-value data (live cursor positions, selections) is throttled (~30–60ms) and sent best-effort, never persisted.
Latency target (<200ms): Achieved via regional Realtime clusters, sticky routing (no cross-region hops), tiny binary/compact JSON payloads, and optimistic local rendering (client applies its own op immediately, then reconciles).

3. Data Model

Board metadata (Postgres — relational, transactional):

boards(board_id, owner_id, title, created_at, updated_at, latest_snapshot_id)
users(user_id, name, email, ...)
board_permissions(board_id, user_id, role[owner|editor|viewer])
sessions(session_id, board_id, started_at, active_user_count)

Board content (DynamoDB/Cassandra — high write throughput, append-friendly):

Op log: partition key board_id, sort key version (monotonic). Each row is one operation {op_type, object_id, payload, user_id, timestamp}.
Snapshots: periodic materialized full-state blobs {board_id, snapshot_version, state_json/binary} stored in object storage (S3) with a pointer row. Loading a board = latest snapshot + replay of ops since that snapshot version.

Object structure within a board:

WhiteboardObject {
  id, type: "stroke" | "text" | "sticky",
  layer/zIndex,
  geometry: { x, y, width, height, rotation },
  props: {  // type-specific
    stroke:  { points:[...], color, thickness },
    text:    { content, font, color },
    sticky:  { content, bgColor }
  },
  createdBy, lastModified, version
}

Conflict resolution: Use a CRDT (e.g., a list/map CRDT like those in Yjs/Automerge) or OT for the object set, so concurrent edits (two users moving/editing different or same objects) converge deterministically without a central lock. Each object carries a logical clock for last-writer-wins on conflicting property updates.

Large/binary assets (uploaded images, exported PNG/PDF) live in S3-style blob storage, referenced by URL in the object.

4. Scalability and Reliability Strategy

Scaling to 10k sessions / 1M users:

Stateless app tier: Horizontally autoscaled behind the load balancer; trivial to add nodes.
Realtime tier: Sharded by sessionId via consistent hashing. With 100 users max per board, 10k sessions = up to ~1M concurrent WS connections. A single tuned node handles ~10–50k connections; fan-out is sized so each board's members concentrate on one node. Autoscale on connection count and CPU.
Redis: Clustered, used for Pub/Sub fan-out and presence; partitioned by session.
Databases: Postgres with read replicas for metadata; DynamoDB/Cassandra scale horizontally for the high-volume op log. Object storage scales effectively infinitely.
Snapshotting: Async workers compact the op log into snapshots on a cadence (e.g., every N ops or T seconds) to bound replay cost and storage.

High availability (99.9%):

Multi-AZ deployment for every tier; multi-region for disaster recovery.
Realtime servers run with a warm standby/replica per board; on node failure, clients auto-reconnect and the new node rehydrates state from the latest snapshot + op log (last few seconds of in-memory ops may be re-sent by clients from their local buffer).
Health checks + automated failover at the load balancer.
Backpressure and rate limiting to protect against op storms.
Durable op log gives a recovery point — no committed work is lost even on server crash.

5. Trade-offs

Chosen trade-off: Latency/availability over strong consistency (AP over CP), using CRDTs.

I prioritize sub-200ms responsiveness and offline tolerance by letting clients apply edits optimistically and resolving conflicts with CRDTs/last-writer-wins, accepting eventual consistency. The benefit: the UI feels instant, users can edit during brief network partitions, and the system stays available during failovers. The cost: in rare concurrent-edit cases the converged result may not match any single user's intuitive expectation (e.g., simultaneous edits to the same text), and CRDT metadata adds memory/storage overhead. For a whiteboard this is the right call — creative collaboration values fluidity and availability far more than the strict serialized consistency a transactional/locking (CP) approach would impose, which would add round-trip latency and reduce availability during partitions.

Secondary trade-off: keeping authoritative live state in-memory on a single Realtime node per board (fast broadcasts, simple merge) versus a fully stateless tier (more resilient but higher latency). Mitigated with snapshots + op log so any node can rebuild state on failover.

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

Judge Models Google Gemini 2.5 Flash

Total Score

Overall Comments

Answer A provides a highly detailed and well-structured system design. Its text-based architecture diagram is exceptionally clear and comprehensive, immediately conveying the system's complexity and component interactions. The discussion of real-time communication, data modeling, and scalability is robust, with specific technology choices and thoughtful explanations. The trade-off analysis is particularly strong, demonstrating a deep understanding of the implications of design decisions, especially regarding consistency and latency with CRDTs.

View Score Details ▼

Architecture Quality

Weight 30%

The text-based architecture diagram is exceptionally clear, comprehensive, and immediately conveys the system's structure and component interactions. Specific technologies are well-integrated and their roles are clearly defined.

Completeness

Weight 20%

Answer A covers all core requirements and constraints comprehensively, providing detailed explanations for each section of the prompt. The data model section is particularly well-structured.

Trade-off Reasoning

Weight 20%

Answer A provides an excellent and well-justified major trade-off (Latency/availability over strong consistency using CRDTs), explicitly framing it as AP over CP. The inclusion of a secondary trade-off further demonstrates a deep understanding of design implications.

Scalability & Reliability

Weight 20%

Answer A presents a robust strategy for both scalability and reliability, detailing horizontal scaling, sharding, multi-AZ/multi-region deployments, warm standbys, and durable op logs. It's very comprehensive.

Clarity

Weight 10%

The answer is exceptionally clear, well-structured with distinct headings, and easy to follow. The text diagram significantly enhances the clarity of the architecture.

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer A is a highly detailed, well-structured system design that covers all required sections with depth and precision. It includes an ASCII architecture diagram, clearly explains component interactions, justifies technology choices (WebSockets, CRDTs, DynamoDB/Cassandra), provides a concrete data model with schema examples, and discusses both primary and secondary trade-offs. The CRDT discussion is particularly strong, showing deep understanding of distributed systems. The latency strategy is concrete and multi-layered. Minor weakness: the diagram is somewhat complex and could be clearer, but overall this is a strong, benchmark-quality response.

View Score Details ▼

Architecture Quality

Weight 30%

A provides a detailed ASCII diagram with explicit component roles, consistent hashing for session routing, Redis Pub/Sub for cross-node fan-out, and clear separation of stateless app tier from stateful real-time tier. Component interactions are well-explained with specific technology choices justified. Minor complexity in the diagram but overall excellent.

Completeness

Weight 20%

A covers all five required sections thoroughly: architecture with diagram, real-time communication with protocol justification and fallback, data model with schema details and CRDT mention, scalability with concrete numbers, and two trade-offs. Large/binary asset handling is also addressed. Very complete.

Trade-off Reasoning

Weight 20%

A's trade-off discussion is insightful and specific: AP vs CP framing, CRDT metadata overhead, the implication for user experience, and a secondary trade-off about stateful vs stateless real-time tier. Demonstrates genuine understanding of distributed systems implications.

Scalability & Reliability

Weight 20%

A provides concrete scaling math (10k sessions, 1M WS connections, 10-50k connections per node), multi-AZ + multi-region strategy, snapshotting cadence details, backpressure mechanisms, and a clear failover rehydration path. Very thorough.

Clarity

Weight 10%

A is well-organized with clear section headers, a detailed diagram, and code-style schema examples. The ASCII diagram is somewhat dense but readable. The writing is precise and technical without being verbose.

Judge Models OpenAI GPT-5.5

Total Score

Overall Comments

Answer A provides a highly coherent and practical architecture with clear separation between REST services, real-time WebSocket collaboration servers, persistence, metadata storage, pub/sub, and async workers. It gives a strong data model, explicitly addresses worst-case connection scale, explains snapshot plus operation-log persistence, and offers a thoughtful consistency-versus-latency trade-off. Its main weakness is some ambiguity around exactly when operations become durably committed versus asynchronously persisted, but overall it is very complete and implementation-oriented.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is well-structured and practical, with clients, CDN, global load balancer/API gateway, stateless app services, stateful real-time servers, Redis pub/sub, metadata DB, object storage, op store, and async workers. The interaction flow is clear and maps well to the whiteboard requirements. Minor ambiguity remains around whether the realtime path synchronously appends to a durable log before acknowledgement.

Completeness

Weight 20%

It covers all requested areas: high-level architecture, WebSocket real-time communication, data model for boards and objects, persistence through snapshots and op logs, scalability, reliability, and trade-offs. It also includes presence, cursors, assets, permissions, and conflict resolution, making it very complete.

Trade-off Reasoning

Weight 20%

The trade-off discussion is strong, focusing on latency and availability over strict consistency, with CRDTs and optimistic rendering. It clearly explains benefits and costs, including user-visible conflict outcomes and metadata overhead. The secondary trade-off around in-memory board ownership is also useful.

Scalability & Reliability

Weight 20%

It directly addresses scaling to 10,000 sessions and up to 1,000,000 concurrent WebSocket connections, using horizontal scaling, session sharding, clustered Redis, scalable op storage, snapshots, multi-AZ deployment, failover, backpressure, and client reconnect. The main gap is that the durability path for operations could be specified more rigorously to avoid loss during realtime server crashes.

Clarity

Weight 10%

The answer is very clear, with a readable diagram, well-labeled sections, concrete examples, and concise explanations of each subsystem. The terminology is mostly consistent, though the mix of CRDT, server sequencing, and last-writer-wins could be clarified further.

Answer B: OpenAI GPT-5.4

High-Level Architecture

A good design is a hybrid of stateless APIs, stateful real-time collaboration workers, durable storage, and an event stream.

Main components:

Web/mobile clients
CDN for static assets
API Gateway / Load Balancer
Auth service
Whiteboard metadata service
Real-time collaboration service
Session router / presence service
Message bus / event stream
Persistence pipeline
Primary database for metadata
Object/document store for whiteboard snapshots and element state
Cache (Redis)
Monitoring, tracing, rate limiting

Text diagram:
Client
-> CDN for app assets
-> API Ga...

Show Full Answer ▼

High-Level Architecture

A good design is a hybrid of stateless APIs, stateful real-time collaboration workers, durable storage, and an event stream.

Main components:

Web/mobile clients
CDN for static assets
API Gateway / Load Balancer
Auth service
Whiteboard metadata service
Real-time collaboration service
Session router / presence service
Message bus / event stream
Persistence pipeline
Primary database for metadata
Object/document store for whiteboard snapshots and element state
Cache (Redis)
Monitoring, tracing, rate limiting

Text diagram:
Client
-> CDN for app assets
-> API Gateway / LB
-> Auth Service
-> Whiteboard API Service
-> Session Router
-> Real-Time Collaboration Nodes (WebSocket)
-> Redis / in-memory session state
-> Pub/Sub or Kafka topic per shard
-> Persistence workers
-> Snapshot Store
-> Metadata DB

Flow:

User opens app and authenticates.
Client fetches whiteboard metadata and latest persisted state through REST/HTTP APIs.
Client then upgrades to WebSocket for the board session.
Session router sends the client to the collaboration node responsible for that whiteboard.
Users send operations such as draw stroke segment, create text box, move object, edit sticky note.
Collaboration node validates, sequences, and broadcasts operations to all participants in the session.
Operations are appended to an event log and periodically compacted into snapshots.
On reconnect or reopen, client loads latest snapshot plus recent operations.

Practical partitioning:

Use consistent hashing or a board-to-node mapping service so one board session is owned by one collaboration node at a time.
This simplifies ordering and conflict handling.
With 10,000 concurrent sessions and up to 100 users/session, the system is large but manageable with horizontal scaling of collaboration nodes.

Real-Time Communication

Protocol:

WebSocket over TLS is the primary choice.
Reason: low-latency bidirectional communication, broad browser/mobile support, simpler than long polling, and efficient for frequent small updates.

Update model:

Client sends operations, not full board state.
Example operations:
- start_stroke, append_stroke_points, end_stroke
- create_text, edit_text
- create_sticky, move_object, resize_object, delete_object
Collaboration node timestamps/sequences operations and broadcasts them to all clients in that board.

Ordering and fanout:

Within a single board, maintain a monotonically increasing sequence number.
All operations from that board pass through its owner collaboration node, which provides total order per board.
Broadcast locally to connected clients; if replicas or multiple nodes serve the same board for failover, use Redis Pub/Sub or Kafka for replication/eventing.

Latency target under 200ms:

Keep collaboration nodes geographically close to users using regional deployments.
Use sticky routing so all users of a board hit the same node in-region when possible.
Broadcast deltas only, compress payloads, batch pen points every few milliseconds.
For freehand drawing, allow local optimistic rendering immediately on the sender before server ack, then reconcile if needed.

Conflict handling:

For object-based edits like move text box or edit sticky note, use per-object versioning.
For highly concurrent state, use operational transform or CRDT-inspired merging if rich multi-user editing is needed.
For a high-level whiteboard system, a simpler model is acceptable:
- Pen strokes are append-only and naturally merge well.
- Object transforms use last-write-wins or server ordering.
- Text content can use simpler whole-object edit locking initially, or later evolve to OT/CRDT for concurrent text editing.

Data Model

Use a split model: metadata relational DB + board content in document/blob storage + operation log.

A. Whiteboard metadata
Table: Whiteboard

board_id
owner_user_id
title
created_at
updated_at
last_snapshot_id
access_policy
region
status

Table: WhiteboardMember

board_id
user_id
role (owner/editor/viewer)
invited_at

Table: ActiveSession

session_id
board_id
user_id
connection_id
joined_at
last_heartbeat
presence_state

B. Board content model
Represent board as a collection of elements on an infinite or bounded canvas.

Element base fields:

element_id
board_id
type: stroke | text | sticky_note | shape
created_by
created_at
updated_at
z_index
position {x, y}
rotation
style object
version
deleted flag

Stroke element:

element_id
points: polyline or Bezier-compressed point list
color
width
opacity

Text element:

element_id
text_content
font_family
font_size
bounding_box

Sticky note element:

element_id
text_content
background_color
bounding_box

C. Operation log
Table or stream: BoardOperation

op_id
board_id
seq_no
user_id
op_type
target_element_id
payload
client_timestamp
server_timestamp

Payload examples:

create element with initial properties
append stroke points
move object from old position to new position
patch text
delete element

D. Snapshots
Snapshot object stored in document/blob store:

snapshot_id
board_id
base_seq_no
created_at
serialized board state or chunked spatial partitions

Why snapshots + operation log:

Replaying entire history forever becomes too slow.
Snapshots allow fast load.
Recent ops after snapshot restore the latest state.

Optional optimization for large boards:

Spatial chunking: partition canvas into tiles/regions so clients load only visible content.
Useful if boards become very large.

Scalability and Reliability Strategy

Scalability

A. Horizontal scaling of collaboration nodes

Collaboration service is the critical path.
Each node maintains in-memory active board sessions assigned to it.
Sessions are sharded by board_id.
If average session size is modest, many sessions can be hosted per node.
Load balancer + session router ensure users join the correct node.

B. Event-driven persistence

Do not synchronously write every pen point to the primary DB before broadcast; that would hurt latency.
Instead:
- collaboration node accepts operation
- appends to durable log / replicated queue
- broadcasts immediately
- async workers persist and compact to snapshots
For durability, use Kafka/Pulsar or a replicated write-ahead log before ack if stronger guarantees are needed.

C. Efficient drawing traffic

Pen tool can generate many points per second.
Reduce volume by:
- client-side point simplification
- batching points every 10-30ms
- binary encoding instead of verbose JSON if needed
- gzip/permessage-deflate on WebSocket

D. Caching

Redis cache for board metadata, presence, routing map, hot snapshots.
Recent boards and permissions can be cached to reduce DB load.

E. Storage strategy

Relational DB (PostgreSQL/MySQL) for users, boards, permissions, sharing metadata.
Document/blob store (S3/object storage or document DB) for snapshots.
Stream/log system for operations.
This avoids forcing one database to handle all access patterns.

Reliability / High Availability

A. Multi-AZ deployment

Run API, collaboration nodes, Redis, and databases across multiple availability zones.
Load balancers health-check instances and reroute automatically.
99.9% uptime is achievable with multi-AZ and rolling deploys.

B. Collaboration node failover

Because board ownership is stateful, node failure is important.
Mitigation:
- operations are continuously replicated to durable log
- periodic in-memory checkpoints or fast snapshots
- session router detects node loss and reassigns board to another node
- clients reconnect automatically and reload snapshot + tail ops
Short disruption is acceptable; users recover quickly.

C. Presence and heartbeats

WebSocket heartbeats detect disconnects.
Presence service updates active user state with TTLs in Redis.

D. Backpressure and protection

Per-user and per-board rate limiting.
Drop or coalesce excessive move/drag events.
Separate control plane traffic from data plane.
Protect against a noisy session consuming all node resources.

E. Durability choices
Two options:

Lower latency: ack after operation is accepted in memory and replicated asynchronously.
Higher durability: ack only after append to replicated log.
I would choose ack-after-log-append for object mutations and batched treatment for pen segments to balance reliability and speed.

Capacity reasoning

10,000 concurrent sessions x average maybe 10 active users/session is 100,000 concurrent connections on average; worst case 1,000,000 if all sessions have 100 users, but that is likely not the intended steady state. Design should support hundreds of thousands of WebSockets horizontally.
Collaboration nodes can each support tens of thousands of idle/light connections or fewer for heavy sessions, so scale out with autoscaling and shard by active board count and event rate.

Trade-off

Major trade-off: consistency vs. latency.

I would prioritize low latency and smooth collaboration over strict global consistency for every action.

What this means:

Clients optimistically render their own drawing immediately.
Server provides authoritative ordering per board, but some operations may be merged using simple rules like server order or last-write-wins.
Persistence is near-real-time rather than blocking every update on a database transaction.

Why this trade-off is good here:

Whiteboard UX is highly interactive; users notice lag much more than occasional minor reconciliation.
Pen strokes and object moves are tolerant to eventual consistency within a very small window.
Strict serializable transactions for all edits would increase latency and reduce throughput.

Cost of this choice:

During concurrent edits to the same object, users may see small jumps or overwritten updates.
Recovering from node failure may replay a short tail of operations.
Rich concurrent text editing may need a more advanced OT/CRDT mechanism later.

Summary design

REST APIs for auth, board metadata, snapshots.
WebSockets for live collaboration.
One logical owner collaboration node per board for sequencing and fanout.
Redis/cache for routing and presence.
Kafka/PubSub/event log for durable operation streaming.
Relational DB for metadata/permissions, object store or document store for snapshots and board state.
Multi-AZ deployment, autoscaling, reconnection/failover, and asynchronous snapshotting for reliability and scale.

This architecture meets the near-real-time requirement, supports persistence, scales to many active sessions, and keeps the system operationally practical.

Result

Winning Votes

0 / 3

Average Score

Judge Models Google Gemini 2.5 Flash

Total Score

Overall Comments

Answer B offers a very strong and complete system design. It covers all required aspects thoroughly, with good explanations for each component and strategy. The discussion on real-time communication and scalability is detailed, including practical considerations like capacity reasoning and different durability choices. While comprehensive, its architecture description is slightly less visually impactful than Answer A's, and its initial approach to conflict resolution is a bit more conservative before mentioning advanced techniques.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture description is solid, listing main components and flow. However, the text diagram is less detailed and visually impactful compared to Answer A's, making it slightly harder to grasp the full system at a glance.

Completeness

Weight 20%

Answer B is very complete, addressing all aspects of the prompt with good detail. The inclusion of a 'Summary design' at the end is a nice touch for recap, and the conflict handling discussion is thorough.

Trade-off Reasoning

Weight 20%

Answer B clearly identifies the consistency vs. latency trade-off, providing strong justification, benefits, and costs. The reasoning is sound and directly applicable to the problem, though it lacks a secondary trade-off discussion.

Scalability & Reliability

Weight 20%

Answer B offers an extremely detailed and practical approach to scalability and reliability, including event-driven persistence, efficient drawing traffic, multi-AZ, and specific failover mechanisms. The 'Capacity reasoning' section is a strong addition.

Clarity

Weight 10%

Answer B is very clear and well-organized with logical headings and explanations. The language is precise, making the design easy to understand, though the architecture diagram is less intuitive than A's.

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer B is a solid, comprehensive system design that covers all required sections. It is well-organized and readable, with clear prose explanations. It includes a text-based flow diagram, detailed data model with field-level specifics, and a good trade-off discussion. However, it is somewhat less precise in certain areas: the CRDT/OT discussion is more tentative ("if needed," "can evolve"), the architecture diagram is less detailed than A's, and the trade-off analysis, while adequate, lacks the depth and insight of A's. The capacity reasoning section is a nice addition. Overall a strong answer but slightly below A in technical depth and precision.

View Score Details ▼

Architecture Quality

Weight 30%

B provides a clear text-based flow diagram and good component enumeration. The architecture is sound and practical. However, the diagram is less detailed, and some component interactions (e.g., how Redis integrates with collaboration nodes) are less precisely described. The session router concept is mentioned but not as deeply elaborated.

Completeness

Weight 20%

B also covers all five sections and adds useful extras like spatial chunking, capacity reasoning, and binary encoding optimizations. The data model is detailed with field-level specifics. Slightly less precise in some areas (e.g., CRDT is mentioned as optional/future), but overall very complete.

Trade-off Reasoning

Weight 20%

B's trade-off discussion is adequate but more generic. The consistency vs latency framing is correct, but the analysis lacks the depth of A's. The mention of OT/CRDT as a future evolution rather than a current design choice weakens the argument. The cost analysis is present but less nuanced.

Scalability & Reliability

Weight 20%

B covers multi-AZ, autoscaling, collaboration node failover, presence heartbeats, backpressure, and durability options. The capacity reasoning section is a nice addition. Slightly less precise on specific numbers and the failover rehydration path, but overall solid.

Clarity

Weight 10%

B is very readable with clean prose, clear section headers, and well-structured lists. The text diagram is simpler and easier to follow. Slightly less technical depth but very accessible. Both answers are clear; B may be marginally easier to read for non-experts.

Judge Models OpenAI GPT-5.5

Total Score

Overall Comments

Answer B is also a strong system design answer with a practical component breakdown, clear WebSocket communication model, detailed data model, and good operational considerations such as batching, compression, rate limiting, failover, and durable log trade-offs. It is especially nuanced about persistence latency and durability choices. Its main weaknesses are that the architecture is slightly less concrete than Answer A's, and its capacity discussion partially softens the stated worst-case target instead of fully committing to supporting 1,000,000 concurrent WebSocket connections.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is solid and includes the right major components: API gateway, auth, metadata service, session router, collaboration nodes, event stream, persistence workers, databases, cache, and monitoring. The flow is practical, but it is somewhat less concrete than Answer A's design and does not describe component interaction or ownership/failover topology quite as precisely.

Completeness

Weight 20%

It covers all core requirements and requested sections thoroughly, including architecture, realtime propagation, data model, scalability, reliability, and trade-offs. It also adds useful details like spatial chunking, point simplification, and backpressure. It is slightly less direct on meeting the absolute worst-case concurrency target.

Trade-off Reasoning

Weight 20%

The trade-off section is very good and operationally realistic. It explains low latency versus strict consistency, optimistic rendering, server ordering, near-real-time persistence, and the cost for concurrent edits and failure recovery. It also earlier discusses ack-after-memory versus ack-after-log, adding useful depth.

Scalability & Reliability

Weight 20%

It provides strong scalability and reliability strategies, including sharded collaboration nodes, event-driven persistence, batching, compression, caching, multi-AZ deployment, failover, presence TTLs, rate limiting, and durable log options. However, the capacity section somewhat retreats from the stated maximum by assuming lower average session size and saying the design should support hundreds of thousands rather than clearly planning for the full million concurrent connections.

Clarity

Weight 10%

The answer is clear and well organized, with logical sections, bullet points, examples, and a summary. It is easy to follow, though it is more verbose and slightly less visually structured than Answer A.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Winner Anthropic Claude Opus 4.8

Winning Votes

3 / 3

Average Score

View this answer

OpenAI GPT-5.4

Winning Votes

0 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models OpenAI GPT-5.5

Claude Opus 4.8 Winner 85

GPT-5.4 83

Why This Side Won

Answer A wins because, under the weighted criteria, it is slightly stronger on the most important area: architecture quality. It gives a more concrete end-to-end architecture, clearer component interactions, stronger treatment of per-board routing and live state ownership, and a more explicit plan for the worst-case scale of 10,000 sessions with up to 100 users each. Answer B is very close and has excellent operational nuance, but Answer A is more complete and directly aligned with the requested high-level system design.

Judge Models Anthropic Claude Sonnet 4.6

Claude Opus 4.8 Winner 86

GPT-5.4 75

Why This Side Won

Answer A wins on the most heavily weighted criteria. In architecture quality (30%), A provides a more detailed and precise diagram with clearer component interactions, explicit consistent hashing, and a more rigorous explanation of the real-time fan-out mechanism. In trade-off reasoning (20%), A's discussion of AP vs CP with CRDTs is more insightful and demonstrates deeper understanding of distributed systems implications. In scalability and reliability (20%), A provides more concrete strategies including specific connection count estimates, snapshotting cadence, and failover rehydration details. Both answers are comparable in completeness and clarity, but A's technical depth across the weighted criteria gives it a clear overall advantage.

Judge Models Google Gemini 2.5 Flash

Claude Opus 4.8 Winner 90

GPT-5.4 89

Why This Side Won

Answer A is chosen as the winner due to its superior architecture quality and more insightful trade-off reasoning, which are heavily weighted criteria. The detailed text diagram in Answer A provides a clearer and more immediate understanding of the system's structure and component interactions. Additionally, Answer A's explicit integration of CRDTs into the core real-time server logic from the outset, and its more nuanced discussion of a secondary trade-off, demonstrate a slightly deeper understanding of the problem's complexities. While Answer B is also excellent and provides strong details, Answer A's presentation and specific design choices for real-time collaboration give it a decisive edge.

Design a Real-Time Collaborative Whiteboard System

Task Overview

Task Prompt

Answers

Answer A: Anthropic Claude Opus 4.8

Real-Time Collaborative Whiteboard — System Design

1. High-Level Architecture

Real-Time Collaborative Whiteboard — System Design

1. High-Level Architecture

2. Real-Time Communication

3. Data Model

4. Scalability and Reliability Strategy

5. Trade-offs

Answer B: OpenAI GPT-5.4

Comparison Summary

Judging Results

Related Tasks

Summarize a Fictional Research Article on Urban Green Spaces

Empathetic Response to Workplace Overwhelm

Explain a Transformer Model to a Teenager

Creative Solutions for Supermarket Food Waste

Summarize a Report on Urban Agriculture Technology

Job Interview for a Professional Ghost

Persuasive Memo for a Four-Day Work Week

Empathetic Response to Work-Related Anxiety

Related Links