Answer A: OpenAI GPT-5.4
High-level design Build an event-driven real-time tracking platform with three main paths: driver ingest, real-time fanout, and query/storage. Drivers continuously publish GPS updates. The backend validates and enriches them, updates the driver’s latest position, maps the driver to any active delivery order, and pushes updates only to the customer tracking that order. 1. Driver mobile to backend Each driver app runs a lightweight location publisher: - Collect GPS coordinates every few seconds, for example every 2 to 5 seconds while on an active delivery, less frequently when idle. - Include driver_id, current or...
Show Full Answer ▼
High-level design Build an event-driven real-time tracking platform with three main paths: driver ingest, real-time fanout, and query/storage. Drivers continuously publish GPS updates. The backend validates and enriches them, updates the driver’s latest position, maps the driver to any active delivery order, and pushes updates only to the customer tracking that order. 1. Driver mobile to backend Each driver app runs a lightweight location publisher: - Collect GPS coordinates every few seconds, for example every 2 to 5 seconds while on an active delivery, less frequently when idle. - Include driver_id, current order_id if assigned, latitude, longitude, speed, heading, timestamp, app version, battery/network hints, and a sequence number. - Apply client-side throttling and movement-based filtering so the app avoids sending unchanged positions. - Batch or coalesce updates during poor connectivity, then send the latest one first when reconnected. Recommended protocol from driver to backend: - HTTPS for periodic location upload is the simplest and most robust choice. - Use a small POST request to a Location Ingest API. - For very high efficiency, gRPC streaming is also a strong option if mobile support and operational maturity are available. Practical choice: - Start with HTTPS because it works well through mobile networks, proxies, and existing API gateways. - Optimize with compression, compact payloads, adaptive send frequency, and regional edge endpoints. Ingest flow: - Driver App - API Gateway or Load Balancer - Authentication and rate limiting - Location Ingest Service - Message broker for async processing 2. Backend services Core services - API Gateway: terminates TLS, authenticates drivers and customers, applies rate limits. - Location Ingest Service: validates payloads, drops stale or duplicate updates, timestamps events, publishes to a broker. - Message Broker: Kafka, Pulsar, or Kinesis for durable high-throughput event streaming. - Driver State Service: consumes location events and maintains latest known driver state in a fast store such as Redis or DynamoDB. - Order Tracking Service: maps driver_id to active order_id and customer subscription channels. - Realtime Fanout Service: pushes location updates to the correct customer connection. - Order Service: source of truth for order lifecycle, assignment, status changes, restaurant pickup, delivery completion. - ETA Service: optionally recalculates ETA using latest traffic-aware route and driver movement. - Historical Storage Service: stores location history for debugging, analytics, dispute resolution, and ML. - Monitoring and Alerting: tracks latency, dropped messages, stale driver positions, and regional outages. Processing pipeline - Driver sends location update. - Ingest service validates auth, schema, timestamp freshness, and plausibility. - Event is written to broker. - Driver State consumer updates latest location cache keyed by driver_id. - Order Tracking consumer checks whether the driver is currently assigned to an active order. - If yes, it publishes a customer-scoped tracking event. - Realtime Fanout sends the update to the subscribed customer app. - Historical consumer stores events in long-term storage. 3. Customer mobile receiving real-time updates Recommended pattern: - Customer app opens a WebSocket connection after entering the order tracking screen. - The app authenticates and subscribes to a single order tracking channel, such as order_id. - Backend verifies the customer is authorized to view that order. - Fanout service sends only updates for that order. - On initial connect, the app receives a snapshot: latest driver location, order status, ETA, last update time. - Then it receives incremental updates in near real time. Fallbacks: - If WebSockets are blocked or unstable, use Server-Sent Events or short polling as fallback. - For backgrounded apps, use push notifications only for major milestones, not continuous tracking. 4. Protocol choices and justification Driver to backend: HTTPS POST - Strong compatibility on mobile networks. - Easier retries, auth, observability, and gateway integration. - Good enough for 50,000 active drivers if updates are throttled sensibly. - Less operational complexity than MQTT. Customer to backend: WebSockets - Best fit for server-to-client real-time updates. - Avoids wasteful polling from 200,000 customers. - Low latency and efficient for many small push messages. - A customer typically tracks one order, so subscription logic is simple. Broker internal communication: Kafka or similar - Decouples ingest from fanout and storage. - Handles spikes, replay, and multiple downstream consumers. - Supports partitioning for horizontal scale. Why not polling for customers: - With 200,000 active customers, frequent polling creates large unnecessary QPS even when location has not changed. - Higher latency and poorer battery/network efficiency. Why not MQTT end-to-end: - Technically suitable for mobile telemetry, but adds client and broker complexity and may be unnecessary unless the organization already operates MQTT at scale. - For this use case, HTTPS plus WebSockets is simpler and usually sufficient. 5. Data models A. Driver latest location Purpose: hot state for real-time reads Fields: - driver_id - lat - lng - geohash or spatial index key - speed - heading - accuracy_meters - recorded_at from device - received_at from server - sequence_number - active_order_id nullable - status such as idle, heading_to_restaurant, waiting, delivering, offline Store: - Redis for ultra-fast latest-state reads and pub/sub metadata, or DynamoDB/Cassandra for durable scalable key-value storage. - TTL can be applied for stale entries. Key example: - driver_id as partition key B. Driver location history Purpose: analytics and replay Fields: - driver_id - timestamp - lat - lng - speed - heading - active_order_id Store: - Time-series friendly storage, object storage via stream sink, or wide-column database. - Retention can be shorter for raw points and longer for summarized traces. C. Order tracking model Fields: - order_id - customer_id - driver_id - restaurant_id - status such as placed, preparing, driver_assigned, picked_up, en_route, delivered, cancelled - pickup_location - dropoff_location - latest_driver_lat - latest_driver_lng - latest_driver_timestamp - eta_seconds - tracking_visibility boolean - assigned_at, picked_up_at, delivered_at Store: - Primary order record in relational DB or distributed transactional store. - Frequently changing tracking projection in Redis or DynamoDB for low-latency reads. D. Subscription/session model Fields: - connection_id - customer_id - order_id - connected_at - last_heartbeat_at - region Store: - In-memory store such as Redis, or managed WebSocket gateway connection registry. 6. Scaling strategy for peak load Traffic estimation If 50,000 active drivers send updates every 5 seconds on average: - About 10,000 location updates per second at peak. If updates are every 2 seconds during active delivery bursts: - About 25,000 updates per second. This is well within the range of a partitioned event-driven system. Scaling approach A. Stateless horizontal scaling - Scale API Gateway, Ingest Service, and Fanout Service horizontally behind load balancers. - Keep request handling stateless; store session and subscription metadata in shared fast storage. B. Partitioned event streaming - Partition location events by driver_id so ordering is preserved per driver. - Scale consumers by adding partitions and consumer instances. - Separate consumer groups for driver state, customer fanout, ETA, and storage. C. Fast hot-state storage - Use Redis cluster or similar for latest location and order tracking projection. - Keep only current state in cache; use durable systems for source of truth and history. - Use TTL and eviction for stale drivers. D. Region-based deployment - Deploy in multiple geographic regions. - Route drivers to nearest region for ingest to reduce latency. - Keep customer tracking in the same region as the order when possible. - Use cross-region replication only for required metadata, not every raw event globally. E. Backpressure and degradation - If the system is overloaded, coalesce updates and publish only the latest driver position per small time window. - Dynamically reduce update frequency for slow-moving or stopped drivers. - Prioritize active tracked orders over idle-driver telemetry. - Drop clearly stale superseded events in the pipeline. F. Efficient fanout - Push only to the customer associated with the order, not broad geo subscriptions. - Send deltas or compact payloads. - Limit update frequency to what is visually useful, for example 1 to 2 seconds maximum to customer app even if backend receives more frequently. G. Reliability - Use at-least-once delivery in the broker with idempotent consumers. - Deduplicate by driver_id plus sequence_number or timestamp. - Heartbeats detect disconnected customers and offline drivers. - Persist last known snapshot so reconnecting customers can recover immediately. H. Observability Track: - ingest QPS - end-to-end latency from driver send to customer receive - stale location percentage - websocket connection counts - fanout failures and reconnects - per-region broker lag - cache hit ratio Suggested end-to-end architecture - Driver App sends HTTPS location updates. - Global Load Balancer routes to nearest region. - API Gateway authenticates and rate limits. - Location Ingest Service validates and publishes to Kafka. - Driver State Service updates latest location in Redis or DynamoDB. - Order Tracking Service joins with active order assignment and computes tracking payload. - Realtime Fanout Service sends over WebSocket to subscribed customer. - Order Service manages lifecycle transitions. - Historical sink stores raw events in long-term storage. - ETA Service periodically updates ETA shown in customer app. Key design decisions summary - Use HTTPS from drivers for robust mobile ingest. - Use WebSockets to customers for low-latency targeted push. - Use a broker to decouple ingest, state updates, fanout, and storage. - Store latest state in a fast key-value store and historical traces separately. - Scale horizontally with partitioning, regionalization, caching, and adaptive throttling. This design provides low latency, targeted real-time updates, operational simplicity, and enough headroom to support tens of thousands of drivers and hundreds of thousands of tracking customers during peak demand.
Result
Winning Votes
3 / 3
Average Score
Total Score
Overall Comments
Strong, coherent event-driven architecture that clearly covers driver ingest, backend processing, targeted fanout to customers, protocol choices, data models, and scaling tactics. Good practical considerations (throttling, filtering, batching, fallbacks, regional routing, backpressure, idempotency). Minor gaps: limited discussion of security/privacy details (token scopes, PII, encryption at rest), exact WebSocket scaling approach (sticky sessions vs managed gateway), and more explicit capacity reasoning for 200k concurrent sockets and fanout throughput, though it is generally implied.
View Score Details ▼
Architecture Quality
Weight 30%Well-structured end-to-end design with clear separation of concerns (ingest, broker, state, order join, fanout, history, ETA). Event streaming backbone and hot-state store are appropriate, and the flow from driver updates to customer-specific updates is logically connected.
Completeness
Weight 20%Directly addresses all six requested aspects, including client behaviors, backend services, customer update mechanism, protocol justification, data models, and scaling. Could be more explicit on authZ rules per order, privacy/retention policies, and concrete WebSocket connection management details.
Trade-off Reasoning
Weight 20%Gives solid justification for HTTPS vs MQTT and WebSockets vs polling, and mentions gRPC as an option with operational caveats. Some tradeoffs could be deeper (e.g., cost/ops tradeoffs of managed WebSocket gateways, Redis vs DynamoDB durability/latency, consistency needs for assignment joins).
Scalability & Reliability
Weight 20%Good scaling plan: horizontal stateless services, partitioned streaming, TTL hot state, regionalization, backpressure/coalescing, and at-least-once with dedupe keys. Reliability aspects are covered, but it would be stronger with more explicit sizing for 200k concurrent WebSockets, multi-region failover strategy, and handling broker/Redis outages.
Clarity
Weight 10%Easy to follow, well organized by prompt sections, with concrete examples of fields, pipeline steps, and scaling estimates. Terminology is consistent and the proposed components and interactions are clearly described.
Total Score
Overall Comments
This is an excellent, comprehensive system design answer that thoroughly addresses all six aspects of the prompt. The architecture is coherent, well-structured, and demonstrates deep understanding of real-time systems at scale. The answer covers driver-to-backend communication, backend processing pipeline, customer-facing real-time updates, protocol justifications, data models, and scaling strategies in significant detail. It also goes beyond the minimum requirements by addressing practical concerns like battery consumption, backpressure, observability, graceful degradation, and fallback mechanisms. The protocol choices are well-justified with clear reasoning about why alternatives were not chosen. The data models are detailed with appropriate field selections and storage recommendations. The scaling strategy includes concrete traffic estimations and multiple complementary approaches. Minor areas for improvement include slightly more discussion of security considerations, geographic failover specifics, and perhaps a visual diagram description. Overall, this is a production-quality system design document.
View Score Details ▼
Architecture Quality
Weight 30%The architecture is well-designed with clear separation of concerns across ingest, processing, fanout, and storage paths. The event-driven approach with Kafka as the central broker is appropriate for this use case. The pipeline from driver to customer is logically sound with proper decoupling. The inclusion of an ETA service, historical storage, and monitoring shows mature architectural thinking. The only minor gap is the lack of explicit discussion of failure modes for individual components and how the system handles partial outages gracefully beyond general backpressure mentions.
Completeness
Weight 20%All six required aspects are thoroughly addressed. The answer covers driver-to-backend communication, backend services, customer real-time updates, protocol choices with justification, detailed data models with field-level specifications, and a comprehensive scaling strategy. It also includes additional valuable elements like fallback mechanisms, observability, backpressure handling, battery considerations, and a clear end-to-end architecture summary. The data models include four distinct models covering all necessary entities. Very little is missing from the prompt requirements.
Trade-off Reasoning
Weight 20%The protocol justifications are strong and well-reasoned. The answer clearly explains why HTTPS was chosen over MQTT for driver ingest, why WebSockets were chosen over polling for customers, and why Kafka serves as the internal broker. The discussion of why not polling and why not MQTT end-to-end shows genuine tradeoff analysis. The mention of gRPC as an alternative with conditions for when it would be appropriate adds depth. The adaptive frequency discussion balancing battery life against data freshness is practical. Could have slightly more discussion of consistency vs availability tradeoffs in the data layer.
Scalability & Reliability
Weight 20%The scaling strategy is comprehensive and realistic. Traffic estimation with concrete numbers (10K-25K updates per second) grounds the design in reality. The answer covers horizontal scaling of stateless services, partitioned event streaming, fast hot-state storage with TTL, regional deployment, backpressure and graceful degradation, efficient targeted fanout, at-least-once delivery with idempotent consumers, and deduplication strategies. The reliability section covers heartbeats, reconnection snapshots, and stale data handling. The only minor gap is limited discussion of database replication strategies and disaster recovery specifics.
Clarity
Weight 10%The answer is exceptionally well-organized with clear headings, numbered sections matching the prompt, and logical flow from component to component. The use of bullet points, labeled subsections, and a summary at the end makes it easy to follow. The processing pipeline is described as a clear step-by-step flow. Technical terms are used appropriately without unnecessary jargon. The suggested end-to-end architecture section provides a good summary. The only minor issue is that the length is substantial, but given the complexity of the topic, the detail is warranted and well-structured.
Total Score
Overall Comments
The design provides a comprehensive and well-reasoned approach to building a real-time driver tracking system. It addresses all aspects of the prompt, offering practical technology choices, clear justifications, and a solid strategy for scalability and reliability. The architecture is detailed and considers potential issues like connectivity and load. A minor area for potential enhancement could be more explicit detail on client-side battery optimization beyond frequency throttling.
View Score Details ▼
Architecture Quality
Weight 30%The architecture is robust, event-driven, and uses appropriate services and patterns (API Gateway, Message Broker, microservices, Redis/DynamoDB for hot state). It clearly outlines the data flow from driver ingest to customer fanout, demonstrating a strong understanding of distributed systems. The choice of HTTPS for drivers and WebSockets for customers is well-justified for the specific use case.
Completeness
Weight 20%All six aspects of the prompt are thoroughly addressed. This includes driver data transmission, backend services, customer data reception, protocol choices with justifications, data models for different entities (driver location, history, order, subscription), and a detailed scaling strategy for peak load. The system's interconnections and data flow are clearly explained.
Trade-off Reasoning
Weight 20%The reasoning for protocol choices (HTTPS vs. gRPC, WebSockets vs. polling, MQTT) is strong and well-contextualized. The justifications for using HTTPS for driver ingest due to compatibility and simplicity, and WebSockets for customer updates due to efficiency and low latency, are persuasive. The explanation for avoiding MQTT is also sensible, focusing on operational complexity.
Scalability & Reliability
Weight 20%The scaling strategy is detailed, covering horizontal scaling, partitioned event streaming, fast hot-state storage, regional deployments, backpressure mechanisms, efficient fanout, and robust reliability measures like at-least-once delivery and idempotency. The traffic estimation provides a good basis for the scaling approach, and the observability points are crucial for maintaining reliability.
Clarity
Weight 10%The answer is well-structured, using clear headings and bullet points to present complex information. The language is precise, and the overall flow of the design is easy to follow. The diagrams implied by the text (e.g., processing pipeline, end-to-end architecture summary) are coherent and effectively communicate the design intent.