Answer A: OpenAI GPT-5.5
A scalable notification service should be built as an asynchronous, event-driven platform that separates event ingestion, preference evaluation, notification persistence, channel-specific delivery, and delivery tracking. The most important design principle is that the source product systems, such as follows, likes, comments, and direct messages, should not synchronously call mobile push providers, email providers, or user devices. They should publish durable events into a messaging layer, and the notification platform should process those events independently with strong retry and idempotency guarantees. At a hi...
Show Full Answer ▼
A scalable notification service should be built as an asynchronous, event-driven platform that separates event ingestion, preference evaluation, notification persistence, channel-specific delivery, and delivery tracking. The most important design principle is that the source product systems, such as follows, likes, comments, and direct messages, should not synchronously call mobile push providers, email providers, or user devices. They should publish durable events into a messaging layer, and the notification platform should process those events independently with strong retry and idempotency guarantees. At a high level, the architecture contains these components: event producers, an ingestion API, a durable event log, notification processors, a user preference service, a template and personalization service, a notification store, channel fanout queues, channel-specific delivery workers, third-party provider integrations, a real-time gateway for in-app delivery, and observability/retry infrastructure. Product services generate notification events when user-facing actions occur. For example, the social graph service emits a new follower event, the post service emits a like or comment event, and the messaging service emits a direct message event. Each event contains an event ID, event type, actor user ID, recipient user ID or recipient set, object ID, creation timestamp, and metadata needed for rendering. Producers send these events to a notification ingestion API or directly to a durable message bus. The ingestion API validates schema, authenticates the producer, assigns or verifies an idempotency key, and writes the event to the durable log before acknowledging the producer. This prevents notification loss if downstream processors fail. For the durable messaging backbone, I would use Apache Kafka, Amazon MSK, Google Pub/Sub, or Pulsar. Kafka/Pulsar are good fits because they provide high throughput, partitioned ordering, retention, replay, consumer groups, and durable storage. At 50,000 notification requests per second, the event stream should be partitioned by recipient user ID for user-level ordering where needed, or by event ID when strict per-user ordering is less important. Partitioning by recipient helps avoid out-of-order in-app notifications for a single user, but it can create hot partitions for celebrity accounts or group events. For large fanout cases, such as one event producing notifications to millions of followers, a separate fanout service should split recipients into batches and publish derived per-recipient notification jobs across many partitions. Notification processors consume raw events from the durable event log. Their responsibilities are to determine recipients, fetch user preferences, apply rate limits and quiet hours, deduplicate events, generate channel-specific notification records, and publish delivery jobs. For direct events like a comment on a user’s post, the recipient set is small. For fanout events such as a celebrity posting, the processor should avoid doing all fanout synchronously. It should create a fanout job and process recipients in shards, using batch reads from the social graph store. This prevents one very large event from blocking the low-latency path for normal notifications. The user preference service stores configuration such as whether a user wants push, in-app, or email notifications for likes, comments, followers, and direct messages. Preferences should be stored in a highly available database such as DynamoDB, Cassandra, ScyllaDB, or a sharded relational database. The access pattern is mostly key-value lookup by user ID and notification type, so a distributed key-value or wide-column store is appropriate. To meet the 2-second latency target, preferences should also be cached in Redis, Memcached, or a local in-process cache with short TTLs. Preference updates are written to the source-of-truth database and propagated to caches through invalidation events. The trade-off is that cache staleness may cause a recently changed preference to take a few seconds to apply; if strict preference consistency is required, processors can read through to the database on cache miss or for recently updated users. The template and personalization service renders notification content. It maps event types to templates such as “Alex liked your post” or “Maya commented: ...”. It handles localization, deep links, image URLs, and channel-specific payload constraints. Template definitions can be stored in a configuration database and cached aggressively because they change infrequently. Rendering should happen before delivery jobs are published so that each job is self-contained and can be retried safely. The notification store is the source of truth for user-visible in-app notifications and delivery state. A good choice is Cassandra, DynamoDB, ScyllaDB, or another horizontally scalable store partitioned by recipient user ID and sorted by notification timestamp. The primary access pattern is “fetch the latest notifications for user X,” so the table can use recipient_user_id as the partition key and created_at or notification_id as the sort key. The service writes an in-app notification record before or atomically with publishing the in-app delivery job. Records include notification ID, recipient, type, content, status, read/unread state, timestamps, and deduplication key. This store guarantees that even if WebSocket delivery fails, the user can still see the notification when opening the app. After preferences and templates are applied, the processor publishes jobs to separate channel queues: push queue, in-app queue, and email queue. Separating queues is important because each channel has different latency and reliability characteristics. Push and in-app queues are latency-sensitive and should be provisioned for high throughput with minimal backlog. Email is less latency-sensitive and can tolerate longer delays, provider throttling, and batching. Separate queues also prevent a slow email provider from affecting push delivery. Push delivery workers consume from the push queue and send notifications to Apple Push Notification service, Firebase Cloud Messaging, or other mobile push providers. Device tokens are stored in a device registry keyed by user ID, with token, platform, app version, locale, and last-seen timestamp. The registry can use a distributed key-value store and cache active tokens. Push workers must handle provider responses, remove invalid tokens, retry transient failures with exponential backoff, and record delivery attempts. Push provider acknowledgments do not guarantee that the user saw the notification, only that the provider accepted it, so the system should distinguish provider acceptance from actual user receipt. In-app delivery has two paths. First, the notification is persisted in the notification store. Second, an in-app delivery worker sends it to the user’s currently connected devices through a real-time gateway. The gateway can be implemented using WebSockets, HTTP/2 streams, or a mobile push-like persistent connection infrastructure. Gateway nodes maintain user connection state in memory and publish presence information to a distributed presence service. A routing layer or Redis/NATS-based presence map tells the in-app worker which gateway node currently owns a user’s connection. If the user is offline or the gateway send fails, no notification is lost because the persisted notification will be fetched through the app’s notification inbox API on the next session. For low latency, gateway nodes should be regionally deployed close to users and the in-app queue should be processed by workers in the same region where possible. Email delivery workers consume from the email queue and send through providers such as SES, SendGrid, or Mailgun. They should support provider failover, bounce handling, suppression lists, unsubscribe compliance, and per-provider rate limits. Email notifications can be batched or digested for low-priority event types like likes, while direct messages or security-related events may be sent immediately. Because email is slower and more expensive, user preferences and rate limiting are especially important. Reliability is achieved through durable writes, at-least-once processing, idempotency, retries, and dead-letter queues. The ingestion layer only acknowledges producers after the event is durably written to Kafka/Pulsar. Consumers commit offsets only after they have successfully written notification records and published downstream channel jobs. Because retries can create duplicates, every event and notification must have stable idempotency keys. For example, a like notification key could be recipient_id + actor_id + post_id + event_type, while a comment notification key could include comment_id. The notification store enforces uniqueness on this key, or processors perform conditional writes. Delivery workers should also use attempt IDs and idempotent state transitions so that duplicate jobs do not create duplicate in-app records or duplicate emails when avoidable. The system guarantees at-least-once delivery, not exactly-once delivery, so clients should also deduplicate by notification ID. Dead-letter queues are required for poison messages, malformed events, repeated provider failures, or records that cannot be rendered. A replay tool should allow operators to fix issues and reprocess events from the original durable log or from the dead-letter queue. Kafka retention should be long enough to support operational recovery, for example several days. Critical metadata and delivery state should also be persisted in the notification database for auditability. To meet the scale requirement of 100 million daily active users and 50,000 notification requests per second, all major services should be horizontally scalable and stateless where possible. Ingestion APIs scale behind load balancers. Kafka/Pulsar topics are partitioned widely enough to support peak throughput and consumer parallelism. Processors and delivery workers run in autoscaling groups or Kubernetes deployments and scale based on queue lag, CPU, provider latency, and request rate. Databases are partitioned by user ID to spread load. Hot-key problems should be handled with sharded fanout jobs, celebrity-user special handling, and backpressure. For extremely large fanout, the system may use pull-based fanout for low-priority notifications: instead of writing one notification per follower immediately, it stores the event once and materializes it when a user opens the app. This reduces write amplification but increases read complexity and may not be appropriate for direct messages or comments. The 2-second latency target for 99% of push and in-app notifications is met by keeping the critical path short: producer to durable log, processor preference lookup from cache, notification record write, publish to channel queue, and immediate delivery by warm workers. Push and in-app workers should be overprovisioned for peak load, and queues should use priority lanes so direct messages and comments are processed ahead of low-priority likes. Regional deployment reduces network latency. For users in multiple regions, routing can be based on the recipient’s home region, with cross-region replication for disaster recovery. The design should measure end-to-end latency from event creation to provider acceptance or gateway send, not merely internal processing time. Observability is essential. The platform should track request rate, queue lag by partition, processing latency, preference cache hit rate, notification creation failures, provider error rates, retry counts, dead-letter volume, WebSocket delivery success, push provider acceptance latency, and end-to-end p50/p95/p99 latency. Distributed tracing should carry the original event ID through every component. Alerts should fire on queue lag growth, elevated failures, provider throttling, database write latency, and missed latency SLOs. There are several trade-offs. Kafka or Pulsar adds operational complexity but provides the durability, replay, and throughput needed for at-least-once delivery at this scale. Cassandra or DynamoDB provides high write scalability and availability but offers limited ad hoc querying, so analytics should be sent to a separate data warehouse. Caching preferences improves latency but introduces short-lived consistency issues. At-least-once delivery is much simpler and more reliable than exactly-once delivery, but the system must tolerate duplicates using idempotency keys and client-side deduplication. Fanout-on-write gives fast inbox reads but can be expensive for high-follower events; fanout-on-read saves writes but complicates ranking and read-time materialization. A hybrid model is usually best: fanout-on-write for direct, important, and normal-volume notifications, and fanout-on-read or batched fanout for massive low-priority events. In summary, the service should use a durable event-driven architecture with partitioned message streams, stateless processors, cached preference lookups, persistent notification storage, separate channel queues, channel-specific workers, and strong retry/idempotency mechanisms. This design scales horizontally to 50,000 notification requests per second, keeps push and in-app delivery within the 2-second p99 target through caching and low-latency queues, and guarantees that notifications are not lost by persisting events and notification records before acknowledging progress.
Result
Winning Votes
3 / 3
Average Score
Total Score
Overall Comments
Answer A presents a thorough, coherent high-level architecture with clear component responsibilities, realistic data flow, and strong attention to operational concerns. It directly addresses all required channels, latency goals, at-least-once delivery semantics, preference handling, large fanout scenarios, idempotency, retries, persistence, and observability. Its technology discussion is balanced and nuanced, with concrete trade-offs such as fanout-on-write vs fanout-on-read, caching consistency, and Kafka/Pulsar operational complexity. The main weakness is that it is somewhat long and could be more condensed, but technically it is strong and well aligned with the prompt.
View Score Details ▼
Architecture Quality
Weight 30%The architecture is well structured and end-to-end: ingestion, durable log, processors, preference service, template service, notification store, per-channel queues, delivery workers, real-time gateway, and observability all fit together coherently. It also distinguishes persisted in-app state from real-time delivery and handles fanout as a first-class concern.
Completeness
Weight 20%It covers all required notification types, user preferences, scale, latency, reliability, technology choices, and trade-offs. It also adds important missing practical concerns such as device registry, dead-letter queues, idempotency keys, fanout batching, regional deployment, observability, and recovery tooling.
Trade-off Reasoning
Weight 20%The answer gives strong comparative reasoning for Kafka/Pulsar, NoSQL choices, caching consistency, at-least-once vs exactly-once, and fanout-on-write vs fanout-on-read. These trade-offs are concrete and tied directly to workload and product behavior.
Scalability & Reliability
Weight 20%This is a major strength. The design clearly explains horizontal scaling, partitioning, queue isolation by channel, hot-key mitigation, retries, consumer offset handling, conditional writes for deduplication, dead-letter queues, replay, and durability before acknowledgment. It directly supports at-least-once delivery and the 2-second target with realistic mechanisms.
Clarity
Weight 10%The explanation is clear, logically ordered, and precise despite being long. It communicates the data flow well, though the length makes it slightly denser and less immediately scannable than a more structured response.
Total Score
Overall Comments
Answer A provides an exceptionally detailed and robust system design. It demonstrates a deep understanding of complex distributed system challenges, such as fanout for celebrity accounts, specific idempotency key construction, and the nuances of at-least-once delivery. The architecture is highly granular, well-reasoned, and explicitly addresses all requirements with sophisticated solutions and trade-off discussions, reflecting the expertise expected from a senior software engineer.
View Score Details ▼
Architecture Quality
Weight 30%Answer A presents a highly detailed and logical architecture, clearly separating concerns and providing robust solutions for complex scenarios like large-scale fanout and two-path in-app delivery. The component interactions are well-defined.
Completeness
Weight 20%Answer A comprehensively addresses all requirements, including advanced topics such as specific idempotency key examples, detailed observability, and nuanced fanout strategies (on-write vs. on-read), demonstrating a very complete understanding.
Trade-off Reasoning
Weight 20%Answer A integrates trade-off discussions throughout the design and explicitly highlights fundamental system design trade-offs (e.g., at-least-once vs. exactly-once, fanout strategies), showcasing a deep understanding of implications beyond just technology choices.
Scalability & Reliability
Weight 20%Answer A provides excellent coverage of both scalability and reliability, detailing specific mechanisms like partitioning strategies, consumer offset commits, durable writes before acknowledgment, hot-key handling, and priority queues, demonstrating a strong grasp of implementation details.
Clarity
Weight 10%Answer A is very clear, well-structured, and uses professional language, making the complex design easy to follow despite its depth. The logical flow is excellent.
Total Score
Overall Comments
Answer A delivers a deeply reasoned, prose-driven system design that engages with subtle, important issues: hot partitions for celebrity fanout, fanout-on-write vs fanout-on-read hybrid, idempotency key construction, presence routing for WebSockets, regional deployment, priority lanes for queues, and the distinction between provider acceptance and user receipt. Trade-offs are discussed in context rather than listed superficially. The narrative is long but coherent and demonstrates senior-level depth. Minor weaknesses: lacks a visual diagram and structured headings/tables that would aid scanning.
View Score Details ▼
Architecture Quality
Weight 30%Comprehensive component breakdown with sophisticated handling of fanout, partitioning by recipient, presence routing, separate channel queues, and persistent notification store as source of truth. Treats subtle issues like celebrity fanout and priority lanes.
Completeness
Weight 20%Covers ingestion, durable log, processors, preferences, templates, notification store, channel queues, workers, WebSocket gateway, device registry, DLQ, observability, regional deployment, and explicit handling of all four requirements.
Trade-off Reasoning
Weight 20%Discusses concrete trade-offs in context: at-least-once vs exactly-once, fanout-on-write vs fanout-on-read hybrid, cache staleness vs consistency, partitioning by recipient vs event ID, Kafka operational complexity vs durability benefits.
Scalability & Reliability
Weight 20%Strong reliability story: durable writes before ack, offset commits after downstream success, idempotency keys with concrete examples, DLQ with replay tooling, hot-key mitigation, regional deployment for latency, priority lanes.
Clarity
Weight 10%Well-structured prose but very long with few visual aids; dense paragraphs make scanning harder despite logical flow.