Design a URL Shortening Service at Scale

Compare model answers for this System Design benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

System Design

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

Anthropic Claude Opus 4.6

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A Google Gemini 2.5 Pro

Answer B OpenAI GPT-5 mini

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.4 Anthropic Claude Sonnet 4.6 Google Gemini 2.5 Flash

Task Prompt

Show more ▼

You are tasked with designing a URL shortening service (similar to bit.ly or tinyurl.com) that must handle the following constraints: 1. The service must support 100 million new URL shortenings per month. 2. The read-to-write ratio is 100:1 (i.e., 10 billion redirects per month). 3. Shortened URLs must be at most 7 characters long (alphanumeric). 4. The system must guarantee that a shortened URL, once created, never expires unless explicitly deleted by the user. 5. Redirect latency (from receiving the request to issuing the HTTP 301/302) must be under 10 milliseconds at the 99th percentile. 6. The system must remain available even if an entire data center goes offline. 7. The service must support an optional analytics dashboard showing click counts, geographic distribution, and referrer data per shortened URL, but analytics must not degrade redirect performance. Provide a comprehensive system design that addresses: A. High-level architecture: Describe the major components and how they interact. B. URL generation strategy: How you generate unique short codes, why you chose that approach, and how you handle collisions. C. Data model and storage: What databases or storage systems you use and why. Include schema considerations. D. Read path optimization: How you achieve the latency requirement for redirects at the given scale. E. Write path: How new URLs are created and persisted reliably. F. Scaling strategy: How the system scales horizontally to handle growth. G. Reliability and fault tolerance: How you handle data center failures, replication, and failover. H. Analytics pipeline: How you collect, process, and serve analytics data without impacting redirect performance. I. Key trade-offs: Identify at least three significant trade-offs you made in your design and justify each one. Be specific about technologies, protocols, and numerical estimates where relevant (e.g., storage calculations, QPS estimates, cache sizes).

Judging Policy

Show more ▼

A strong response should be evaluated on the following dimensions: 1. Completeness: Does the answer address all nine sections (A through I) explicitly? Missing sections should be penalized. 2. Numerical rigor: Does the answer include back-of-the-envelope calculations such as QPS estimates (reads and writes), storage requirements over time, cache sizing, and short-code keyspace analysis? Vague answers without numbers are weaker. 3. Architecture coherence: Do the components fit together logically? Are data flows clearly described? Is it clear how a request travels through the system from client to response? 4. URL generation strategy: The approach should be well-reasoned (e.g., base62 encoding of a counter, pre-generated key service, hash-based with collision handling). The answer should explain why the chosen method works at scale and how collisions are avoided or resolved. 5. Read path optimization: The answer should describe a caching layer (e.g., Redis, Memcached, or CDN-level caching) with justification for cache hit ratios and eviction policies. Simply saying 'use a cache' without detail is insufficient. 6. Reliability: Multi-region replication, failover mechanisms, and consistency model choices (eventual vs. strong) should be discussed with clear reasoning. 7. Analytics separation: The analytics pipeline should be clearly decoupled from the redirect hot path (e.g., async logging, message queues, stream processing). Designs that couple analytics with redirects should be penalized. 8. Trade-off reasoning: At least three trade-offs should be explicitly identified and justified. Examples include consistency vs. availability, cache staleness vs. latency, storage cost vs. TTL policies, or complexity vs. performance. Superficial trade-offs without justification are weaker. 9. Technology specificity: Naming concrete technologies (e.g., Cassandra, DynamoDB, Kafka, Redis Cluster) with reasons for choosing them is stronger than generic descriptions. 10. Clarity and organization: The answer should be well-structured and easy to follow, with clear section headers or logical flow.

Answers

Answer A: Google Gemini 2.5 Pro

A. High-level Architecture

The system is designed as a set of distributed, stateless microservices deployed across multiple geographic regions to ensure high availability and low latency. The main components are:

Global Load Balancer (GLB): Distributes user traffic to the nearest and healthiest region using DNS-based routing (e.g., AWS Route 53, Google Cloud Load Balancing).
Regional Load Balancers (RLB): Distribute traffic within a single region to the fleet of web servers.
API Gateway / Web Servers: A layer of stateless servers that terminate TLS and handle incoming HTTP requests f...

Show Full Answer ▼

A. High-level Architecture

The system is designed as a set of distributed, stateless microservices deployed across multiple geographic regions to ensure high availability and low latency. The main components are:

Global Load Balancer (GLB): Distributes user traffic to the nearest and healthiest region using DNS-based routing (e.g., AWS Route 53, Google Cloud Load Balancing).
Regional Load Balancers (RLB): Distribute traffic within a single region to the fleet of web servers.
API Gateway / Web Servers: A layer of stateless servers that terminate TLS and handle incoming HTTP requests for both writes (shortening) and reads (redirection).
Redirect Service (Read Path): A highly optimized service responsible for looking up short codes and issuing HTTP 301/302 redirects. It interacts primarily with the caching layer.
URL Shortening Service (Write Path): A service that handles the creation of new short URLs. It coordinates with the Key Generation Service and the primary database.
Distributed Cache: An in-memory cache (e.g., Redis Cluster) in each region that stores hot URL mappings to meet the strict low-latency requirement for redirects.
Primary Database: A distributed NoSQL database (e.g., Apache Cassandra, Amazon DynamoDB) that serves as the persistent source of truth for all URL mappings, replicated across all regions.
Key Generation Service (KGS): A dedicated, highly available service that pre-generates batches of unique, 7-character short codes to eliminate write-time collisions and latency.
Analytics Pipeline: An asynchronous data pipeline starting with a message queue (e.g., Apache Kafka) to ingest clickstream data without impacting the performance of the redirect service. This data is then processed and stored in a separate analytics database.

B. URL Generation Strategy

Approach: We will use a dedicated Key Generation Service (KGS) to pre-generate unique keys.

Mechanism:

The KGS maintains a counter in a distributed, fault-tolerant manner (e.g., using ZooKeeper or an atomic counter in a database like Redis).
It generates large, sequential numeric IDs. To ensure high availability, multiple KGS instances can be run, each responsible for a different range of IDs (e.g., Server 1 handles 1 to 1,000,000, Server 2 handles 1,000,001 to 2,000,000).
Each numeric ID is then converted to a base-62 string ([a-z, A-Z, 0-9]) to produce the 7-character short code. A 62^7 space provides ~3.5 trillion unique codes, which is more than sufficient.
The KGS generates these codes in batches and places them into a queue (e.g., a Redis list) for the URL Shortening Service to consume.

Justification: This approach avoids the need to check for collisions in the main database during a write operation, which would be slow and a point of contention. It makes the write path extremely fast and predictable, as the Shortening Service simply needs to fetch a guaranteed-unique key from the KGS.

C. Data Model and Storage

Primary Storage (URL Mappings):

Technology: Apache Cassandra or Amazon DynamoDB.
Why: These NoSQL databases offer excellent horizontal scalability, native multi-region replication, high availability, and low-latency key-value lookups, which perfectly match our requirements for both scale and fault tolerance.
Schema:
- Table Name: url_mappings
- Partition Key: short_code (string)
- Columns:
  - long_url (string)
  - user_id (string, for ownership)
  - created_at (timestamp)

Cache Storage:

Technology: Redis Cluster.
Why: Redis provides extremely low-latency (sub-millisecond) in-memory data access, which is essential for meeting the <10ms redirect requirement. It can be clustered for scalability and high availability.

Analytics Storage:

Technology: A column-oriented database like Apache Druid, ClickHouse, or a cloud data warehouse like Google BigQuery.
Why: These systems are optimized for fast aggregations and analytical queries over massive datasets, which is ideal for powering the analytics dashboard.

D. Read Path Optimization

The read path is heavily optimized for latency using a multi-layered caching strategy to handle the 40,000 peak QPS.

CDN/Edge Cache: For extremely popular URLs, a CDN can cache the 301/302 redirect response at edge locations, serving users from the closest point of presence without hitting our core infrastructure.
Distributed In-Memory Cache (Redis): This is the primary workhorse for low latency. The Redirect Service first queries the regional Redis cluster. A cache hit results in an immediate redirect.
- Cache Size Estimate: To cache 20% of 5 years of URLs (100M/month * 12 * 5 * 0.2 = 1.2B URLs). At ~600 bytes per entry (short code, long URL, overhead), this requires approximately 720 GB of RAM per region, which is feasible for a Redis cluster.
Database Lookup: On a cache miss, the service queries the primary Cassandra/DynamoDB database. The result is then written back to the Redis cache with a Time-To-Live (TTL) to ensure the cache remains filled with frequently accessed items.

This architecture ensures that the vast majority of requests are served from memory, easily meeting the <10ms p99 latency goal.

E. Write Path

The write path is designed for reliability and throughput (~400 peak writes/sec).

A user sends a POST request with the long URL to the API Gateway.
The request is routed to an instance of the URL Shortening Service.
The service validates the long URL.
It requests a unique short code from the Key Generation Service (KGS).
The service writes the new mapping (short_code, long_url) to the primary database (Cassandra/DynamoDB) with a high consistency level for the local region.
The database asynchronously replicates this write to other regions.
Upon successful database write, the service returns the new short URL to the user.

F. Scaling Strategy

The system is designed to scale horizontally at every layer:

Stateless Services: The API Gateway, Redirect Service, and Shortening Service are stateless. We can scale them by adding more virtual machines or containers behind the load balancers.
Database: Cassandra and DynamoDB are designed to scale horizontally by adding more nodes to the cluster. The data and load are automatically rebalanced.
Cache: Redis Cluster can be scaled by adding more nodes and re-sharding the keyspace.
Analytics Pipeline: Kafka scales by adding more brokers and partitions. The consumer services that process the data can also be scaled horizontally.

G. Reliability and Fault Tolerance

High availability is achieved through redundancy across multiple geographic regions (e.g., US-East, US-West, EU-West).

Multi-Region Deployment: The entire stack is deployed in at least two independent regions.
Global Failover: The Global Load Balancer continuously monitors the health of each region. If an entire region becomes unavailable, the GLB automatically redirects all traffic to the healthy regions.
Data Replication: The primary database (Cassandra/DynamoDB) is configured for multi-region replication. This ensures that if one regional database is lost, the data is preserved in other regions. Writes can continue to be served in the remaining active regions.
No Single Point of Failure: Every component, from the load balancers to the KGS and databases, is deployed in a clustered, fault-tolerant configuration.

H. Analytics Pipeline

The analytics pipeline is designed to be completely decoupled from the critical redirect path to prevent any performance degradation.

Data Ingestion: After successfully serving a redirect, the Redirect Service publishes a message to a Kafka topic. This message contains event data like short_code, timestamp, IP address, User-Agent, and Referrer. This is an asynchronous, non-blocking operation.
Data Processing: A separate fleet of stream processing workers (e.g., using Spark Streaming, Flink, or custom consumers) reads messages from the Kafka topic in batches.
Data Enrichment & Aggregation: These workers enrich the data (e.g., converting IP to geo-location) and perform real-time aggregations (e.g., incrementing click counts per hour, per country).
Data Storage: The processed and aggregated data is loaded into the analytics database (ClickHouse/Druid), which is optimized to power the user-facing dashboard queries efficiently.

This asynchronous architecture ensures that even a massive spike in traffic or a failure in the analytics pipeline will have zero impact on the speed and availability of URL redirections.

I. Key Trade-offs

Eventual Consistency vs. Strong Consistency: We chose an eventually consistent data replication model across regions. A URL created in one region might take milliseconds to be available in another. This trade-off significantly improves write latency and system availability, which are more critical for this application than immediate global consistency.
Cost vs. Performance: We are using a large, expensive in-memory Redis cache to meet the strict <10ms p99 latency requirement. A cheaper, disk-based solution would not be able to satisfy this core performance SLA. We are trading higher operational cost for guaranteed low latency on the critical read path.
Complexity vs. Scalability (KGS): We introduced a separate Key Generation Service (KGS) instead of generating hashes or random strings on the fly. This adds an extra component to build and maintain, increasing system complexity. However, it completely eliminates database hotspots and collision-checking logic on the write path, which is a crucial trade-off to ensure high throughput and low-latency writes at scale.

Result

Winning Votes

0 / 3

Average Score

Judge Models Google Gemini 2.5 Flash

Total Score

Overall Comments

Answer A provides a solid and comprehensive system design for a URL shortening service. It covers all required sections, proposes a logical high-level architecture with standard components, and includes reasonable numerical estimates for storage and QPS. The URL generation strategy using a Key Generation Service (KGS) and base-62 encoding is well-justified for scalability and collision avoidance. The read path optimization leverages multi-layered caching effectively, and the analytics pipeline is correctly decoupled. The discussion on reliability and fault tolerance is adequate, and the identified trade-offs are relevant. However, some areas could benefit from more granular detail and a slightly more advanced approach, particularly in the read path and analytics event generation.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is solid, with clear components like GLB, KGS, and separate read/write services. The interaction flow is logical, and the choice of distributed NoSQL and Redis is appropriate. It's a well-structured, standard microservices approach.

Completeness

Weight 20%

All nine required sections (A-I) are explicitly addressed, providing a comprehensive overview of the design. No major sections are missing.

Trade-off Reasoning

Weight 20%

Three significant trade-offs are identified (Eventual Consistency vs. Strong Consistency, Cost vs. Performance, Complexity vs. Scalability with KGS) and justified clearly, showing an understanding of design compromises.

Scalability & Reliability

Weight 20%

The answer discusses horizontal scaling for all layers and outlines multi-region deployment with global load balancing and data replication. It correctly identifies the need for no single point of failure.

Clarity

Weight 10%

The answer is well-structured with clear headings and bullet points, making it easy to follow the design components and their interactions.

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer A is a solid, well-organized response that covers all nine required sections with clear headers and logical flow. It correctly identifies the major components, uses appropriate technologies (Cassandra/DynamoDB, Redis, Kafka, ClickHouse), and provides a coherent architecture. The URL generation strategy using KGS with base-62 encoding is well-explained. However, the numerical rigor is somewhat limited: the cache sizing calculation is questionable (caching 20% of 5 years of URLs at 720GB seems excessive and not well-justified), QPS estimates are mentioned briefly but not derived step-by-step, and storage estimates are absent. The trade-offs are reasonable but somewhat generic. The read path optimization is good but lacks the CDN-first edge caching layer that would be the primary mechanism for achieving sub-10ms p99 at this scale. Overall a competent answer but missing depth in quantitative analysis.

View Score Details ▼

Architecture Quality

Weight 30%

Answer A presents a coherent multi-region architecture with appropriate components (GLB, RLB, API Gateway, Redirect Service, KGS, Redis, Cassandra, Kafka). The data flow is clear. However, it underemphasizes CDN-level edge caching as the primary latency optimization, which is the most important mechanism for achieving <10ms p99 at global scale. The KGS design is well-reasoned. The read path relies primarily on Redis rather than CDN, which is a meaningful architectural gap.

Completeness

Weight 20%

Answer A addresses all nine required sections (A through I) with clear headers. However, storage estimates are absent, QPS derivations are brief, and the cache sizing calculation (720GB) appears inflated and poorly justified. The write path and scaling sections are somewhat thin. All sections are present but some lack depth.

Trade-off Reasoning

Weight 20%

Answer A identifies three trade-offs: eventual vs. strong consistency, cost vs. performance (Redis), and complexity vs. scalability (KGS). These are relevant and correctly identified, but the justifications are somewhat generic and brief. The consistency trade-off could be more specific about the implications for the user experience.

Scalability & Reliability

Weight 20%

Answer A covers multi-region deployment, global failover via GLB, Cassandra/DynamoDB multi-region replication, and horizontal scaling of stateless services. The reliability section is adequate but lacks specifics on failover timing, replication lag, and consistency model choices during failover. The KGS availability during region failure is not addressed.

Clarity

Weight 10%

Answer A is well-organized with clear section headers matching the required A-I structure. The writing is concise and easy to follow. Each section is focused and not overly verbose. The schema is presented clearly. This is one of Answer A's strongest dimensions.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer A is well-structured and explicitly covers sections A through I with sensible component choices such as Redis, Cassandra/DynamoDB, Kafka, and a separate analytics store. It demonstrates solid understanding of multi-region deployment, caching, and asynchronous analytics separation. However, it is weaker on numerical rigor and specificity in critical areas: some estimates are sparse or inconsistent, write and read QPS assumptions are only partially developed, the cache sizing logic is not tied to an expected hit-rate model, and the URL generation service relies on a somewhat hand-wavy KGS design using Redis/ZooKeeper without enough detail on failure handling or allocator correctness. Reliability discussion is generally sound but high level, especially around replication semantics, failover behavior, and cross-region consistency. Trade-offs are present and reasonable but not deeply explored.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is coherent and covers the major services expected in a scalable URL shortener: load balancers, stateless services, cache, primary store, key generation, and analytics. Request flow is understandable, but some parts remain generic, especially KGS behavior, failover interactions, and cache invalidation details.

Completeness

Weight 20%

It explicitly addresses sections A through I and touches all required areas, including analytics and trade-offs. However, some requested sub-details are light, especially schema richness, numerical estimates, collision handling details, and explicit handling of deleted URLs or dashboard serving behavior.

Trade-off Reasoning

Weight 20%

It names three valid trade-offs such as eventual consistency, cache cost, and KGS complexity. The reasoning is correct but fairly standard and brief, without much exploration of alternative designs or operational implications.

Scalability & Reliability

Weight 20%

The answer demonstrates sound horizontal scaling ideas and a reasonable multi-region reliability posture. Still, it is somewhat high level on replication modes, failover mechanics, KGS fault tolerance, and how the system behaves under cache cold starts or region loss beyond generic redirection of traffic.

Clarity

Weight 10%

The answer is easy to follow, neatly divided by the required sections, and generally concise. Some explanations are broad rather than precise, which slightly reduces clarity when evaluating implementation realism.

Answer B: OpenAI GPT-5 mini

Summary
I propose a globally distributed, cache-first URL shortener with: a lightweight edge redirect path (CDN + edge function + local cache) to meet the <10 ms p99 redirect latency requirement; a durable, multi-region primary store for canonical mappings; a distributed ID allocator to generate unique short codes encoded in base62 (<=7 chars); an async analytics pipeline (Kafka/Kinesis → stream processors → OLAP store) that is strictly decoupled from the redirect path; and multi-region replication and failover for availability during a full data center outage.

A. High-level architecture (components and interact...

Show Full Answer ▼

A. High-level architecture (components and interactions)

Edge layer (CDN): Cloud CDN (e.g., Amazon CloudFront, Fastly) fronts all GET /{code} redirect requests. Most requests are served from CDN edge cache with a cached 301/302 response.
Edge function (CloudFront Function / Fastly Compute / Lambda@Edge): if cache miss, a tiny edge function invokes a regional Redirect API (via a short RPC/HTTP) to fetch the canonical destination URL and return a 301/302. The edge function adds minimal logic to choose correct cache TTL and handle deleted/missing codes.
Regional Redirect API / Read Cache tier: stateless read servers in each region that first consult a regional in-memory cache (Redis cluster / ElastiCache or Memcached) then fall back to the durable key-value store if needed.
Durable key-value store (Primary DB): DynamoDB Global Tables or Cassandra (multi-region) that store short_code -> long_url and metadata as the canonical source of truth.
ID Allocation service (Range allocator): a small service that hands out ID blocks to Write API servers for local short code generation (ensures uniqueness without per-write central lock).
Write API: service that handles creation requests, reserves/generates the short_code (via ID block), persists to primary DB, and propagates an invalidation to caches and CDN. Writes are synchronous to primary DB for durability.
Cache invalidation & propagation: on create/update/delete, Write API updates regional caches and invalidates CDN edge entries via CDN invalidation API or by writing cache-control headers and using a version token in URL if necessary.
Analytics pipeline (async): redirect events are recorded asynchronously (not in the synchronous redirect path). Edge loggers or regional read servers push lightweight click events to a message bus (Kafka / Kinesis). Stream processing (Flink / Spark Streaming) aggregates and writes to an analytics store (ClickHouse / BigQuery) and to pre-aggregated counters in a read-optimized store for dashboards.
Dashboard API & UI: reads aggregated analytics from OLAP/aggregates stores and serves user dashboards. Dashboard queries never hit the redirect path.

B. URL generation strategy
Goals: uniqueness, compactness (<=7 alphanumeric chars), high throughput, low contention.
Chosen approach: sequential numeric IDs allocated in distributed ID blocks, base62-encoded to produce the short code.

Why base62 sequential IDs: base62 (a–z, A–Z, 0–9) gives 62^7 ≈ 3.52 × 10^12 possible codes with up-to-7-char strings — far more than expected lifetime needs. Sequential IDs encode compactly and are easy to reverse to numeric values if needed. Deterministic mapping avoids hashing collisions.
ID allocation implementation: a central allocator hands out monotonically increasing ID ranges (e.g., blocks of 1M) to each write-server cluster. Each write server issues local IDs from its block without remote coordination, ensuring uniqueness and very low latency. The allocator itself is small and can be backed by a highly-available store (RDS or a lightweight ZooKeeper/etcd based counter) and only used for block refills (low QPS).
Encoding: numeric ID -> base62 string. If numeric ID < 62^7, encoding length is <=7. With 100M new shortenings/month (1.2B/year), 62^7 capacity gives >2,900 years of space.
Handling collisions: none expected because numeric IDs are unique. Still, the Write API uses a conditional insert against the primary DB (PUT with primary key == short_code) and retry on rare primary DB conflict (shouldn't happen if allocator is correct). For user-requested custom aliases, check and return error if already taken.
Optional deduplication: optionally maintain a secondary hash index (e.g., SHA-256 of long_url) to return an existing short code if the same URL was shortened before by same user; this is application-level behavior and is optional.

C. Data model and storage
Primary data store choices: DynamoDB Global Tables (managed, multi-region, single-digit ms read/write), or Apache Cassandra / ScyllaDB (self-managed multi-region) as the canonical choices. I recommend DynamoDB Global Tables for faster ops and simpler multi-region replication unless you must be cloud-agnostic.
Primary mapping table schema (key-value optimized):

Table name: URL_Mapping
Primary key: short_code (string, PK)
Attributes: long_url (text), user_id (string), created_at (timestamp), custom_alias_flag (bool), deleted_flag (bool), metadata (JSON/Sparse map), analytics_enabled (bool), version (int)
Secondary indexes (optional): user_id -> list of short_codes (GSI) for management UI; long_hash -> short_code (for de-duplication if desired)
Storage estimates: assume each record stores 200 bytes on average (short_code ~7 bytes, URL avg 200? but we can compress; assume 200–400 bytes conservative). At 100M/month new rows: 100M * 200 B = 20 GB/month. Yearly ≈ 240 GB. Ten-year ≈ 2.4 TB. DynamoDB/Cassandra can easily handle this scale.
Analytics stores: raw click events go into append-only systems (Kafka/Kinesis) then into a long-term analytics store (ClickHouse or BigQuery) for aggregation and dashboards. Pre-aggregated counters (per short_code per timebucket) can be stored in ClickHouse and hot counters cached in Redis for dashboard queries.

D. Read path optimization (achieving <10 ms p99 redirects)
Objective is to serve 99th percentile redirects under 10 ms from request arrival to issuing 301/302.
Techniques used:

CDN + Edge Cache (primary optimization): cache full 301 responses at CDN edges for nearly all requests. Set very long TTLs (effectively non-expiring) because mappings do not expire, but support immediate invalidation when a mapping is updated/deleted.
- With CDN edge hit, latency to client typically <10 ms globally.
Very small edge function for cache-miss lookup: CloudFront Function or Fastly’s edge compute to minimize runtime overhead (~sub-ms). If cache miss, the edge function calls a regional Redirect API via a short TCP connection (keepalive) and returns 301 to CDN.
Regional read cache (Redis in each region): Regional cache is a memory-first store for mapping lookups; typical Redis GET <1 ms. Cache hit rate target: >=99% for hot codes. Use LFU/LRU eviction and size to hold working set.
- Cache sizing example: assume peak global RPS = 40k reads/sec; working set of top 50M codes (hot tail) — store short_code->long_url pairs (avg 200 bytes). Memory = 50M * 200 B ≈ 10 GB. A modest multi-shard Redis cluster (e.g., 4-8 nodes of 32 GB each) per region can handle this.
Origin DB access only on cache miss: DynamoDB single-row GetItem is typically low ms (1–10 ms) but we design to avoid being in critical path for p99 by caching heavily.
Keep the edge function + HTTP path minimal: use HTTP/2 or HTTP/3 between CDN and origin to reduce handshake latency and enable connection reuse.
Local anycast + geo-aware routing: send client to nearest edge/region to keep RTT low.
Measurement and SLA: test with synthetic traffic and 99th percentile latency budgets allocated: CDN hit (target <5 ms), edge function + Redis <10 ms, origin fallback acceptable for low percentiles but will be monitored and tuned.

E. Write path: creation and persistence

Client makes POST /create (or via UI) to Write API (region-aware endpoint). Write API layer is stateless and autoscaling.
Write API obtains a numeric ID from its locally allocated block (range allocator). If block exhausted, requests a new block from the allocator.
Encode numeric ID to base62 short_code.
Persist to primary DB with a conditional insert: PutItem(short_code, long_url, metadata) with conditional expression that short_code does not exist (prevents accidental overwrite of custom alias). Ensure atomic write for durability.
Upon successful write:
- Update regional read cache (write-through) so subsequent redirects hit the cache.
- Send a CDN cache pre-warm request or publish an invalidation for this short_code to the CDN so the new mapping is immediately cached at edges (or leverage the CDN’s cache-control + versioning to make it effective).
Return created short_code to user.
Durability and consistency: synchronous write to primary DB with replication across regions (DynamoDB global tables or Cassandra with multi-DC replication). If using DynamoDB, consistency model can be eventually consistent for reads but writes are durable and replicated.
Operational numbers: writes QPS average ~40 writes/sec (100M/month ≈ 38.6/s). Peak bursts possible; the write pipeline is easy to scale horizontally.

F. Scaling strategy (horizontal scaling)

Stateless API / Redirect servers: autoscale horizontally behind a global load balancer (ALB / GCLB). Keep servers stateless so they are easy to scale.
ID allocator: low QPS—scale by making it fault-tolerant (active/passive + persisted counter or delegated range allocation). Allocate larger blocks to reduce allocator load.
Caches: Redis/ElastiCache clusters per region, sharded (consistent hashing). Add shards to grow memory throughput.
Primary DB: DynamoDB auto-scaling or Cassandra cluster that can add nodes and grow throughput. Choose instance sizes and replication factor to meet read/write capacity.
Message bus (Kafka/Kinesis): partition the click stream by short_code hash to scale ingestion. Use enough partitions to meet peak throughput (e.g., if peak redirects 38k RPS generating 38k events/sec, provision Kafka partitions and brokers to handle ~50–100k events/s with replication factor 3).
Analytics compute: scale Flink / Spark clusters horizontally based on event volume.
CDN scales automatically; CDN config should be tuned for request rates.

G. Reliability and fault tolerance
Goals: survive an entire data center going offline, ensure no data loss.

Multi-region deployment: deploy at least two active regions (multi-active) with global routing. Use global load balancing + health checks to route around failed region.
Primary DB replication: DynamoDB Global Tables provide active-active replication across regions; Cassandra/Scylla can be configured with replication factor across data centers. This provides durability if one DC is lost.
Regional caches are warm on failure: when a region fails, traffic routed to next region which will have its own cache. Cold-fronting a region after failover may cause more origin reads until caches warm but availability is preserved.
ID allocator fault tolerance: allocator state persisted in a highly-available store; allocate large blocks to reduce need to access allocator under failover.
Message bus replication: Kafka with replication factor >=3 across racks/regions or managed Kinesis with cross-region replication for durability.
Health checks and automated failover: active monitoring, circuit breakers, rate-limiting to prevent overload during failover.
Backups: periodic snapshots of primary DB and exports of metadata. For DynamoDB, enable point-in-time recovery; for Cassandra, scheduled snapshots.

H. Analytics pipeline (collect/process/serve without impacting redirects)
Design principle: analytics writes must be asynchronous and never block redirects.

Lightweight event generation: edge (CDN/edge function) emits a small event for each redirect (short_code, timestamp, client_ip or geo tag, referrer, user-agent). To minimize redirect latency, emitting is done via a very fast UDP-like push to a local proxy or via batching in memory in the read server and sent asynchronously.
Message bus: events are pushed to Kafka or Kinesis topics partitioned by short_code (or hashed) to support scalable parallel processing. The producer must be asynchronous and non-blocking; if the local Kafka buffer is full, fall back to sampling or drop low-value fields to ensure redirect latency is not affected. Producers use local buffering and backpressure policies.
Stream processing: Flink / Kafka Streams / Spark Streaming consumes events, enriches (geo-IP lookup, UA parsing), and computes real-time aggregates (click counts, geo distribution, referrers) at minute/hour granularity. Pre-aggregate per short_code per time window.
OLAP store & aggregates: write aggregated data to ClickHouse or BigQuery for fast analytical queries and long-term storage. For serving dashboards, store recent aggregates in a fast read store (Redis or Druid) for interactive queries.
Dashboard API: reads only from aggregates/OLAP store; no direct query from analytics store to redirect path. Implement rate limits and per-user quotas for dashboard queries.
Sampling and tiered logging: for extremely high-traffic short_codes, optionally sample events to reduce pipeline load while preserving representative analytics.
Performance example: peak redirects ~40k/s -> events 40k/s -> Kafka ingest easily manageable with 100 partitions and a few brokers. Stream processors scale horizontally to handle this volume and write summarized aggregates every minute to ClickHouse.

I. Key trade-offs (at least three) with justification

CDN edge-caching vs immediate consistency for deletes/updates
- Trade-off: Using long-lived CDN edge cache yields very low redirect latency but makes update/delete propagation slightly more complex and not instant everywhere.
- Justification: Redirect latency SLA is strict (<10 ms p99). Typical operations to modify/delete a short URL are rare compared to redirects. We prefer ultra-low latency for redirects and accept slightly delayed cache invalidation (we provide immediate invalidation APIs for important cases). We also include a version token or short TTL invalidation strategy for critical updates.
Sequential ID allocation (range blocks + base62) vs fully random/hashing scheme
- Trade-off: Sequential allocation with blocks requires a small allocator service and careful handling during cross-region failover, while hashing/random generation can be fully stateless but requires collision resolution or larger code length to reduce collisions.
- Justification: Sequential IDs produce compact, very short codes deterministically and avoid collision handling at write time. With block allocation, allocator load is minimal and scale-friendly. Given the extremely large keyspace (62^7) we don’t need randomness to avoid collisions.
DynamoDB (managed, multi-region) vs self-hosted Cassandra
- Trade-off: DynamoDB simplifies operational burden and gives managed multi-region replication, but may be more expensive and somewhat less flexible in query patterns compared to self-hosted Cassandra/Scylla.
- Justification: The system’s access pattern is simple primary-key reads/writes; we value operational simplicity, reliability, and automatic scaling and therefore recommend DynamoDB Global Tables unless cost constraints force Cassandra.
Synchronous write-through caching vs eventual propagation to caches
- Trade-off: Synchronous write-through to caches increases write latency slightly but ensures immediate availability at read caches; eventual propagation lowers write latency but may cause short window where redirects miss.
- Justification: Writes are low QPS (≈40/s), so performing a cache write on creation is acceptable to ensure newly created links are immediately available for redirects and avoid cold cache misses for newly created short codes.

Operational considerations and telemetry

Monitoring: p99 redirect latency, cache hit ratio, origin RPS, write latency, message bus lag, analytic pipeline lag. Alerting for cross-DC replication lag.
Capacity planning: plan for 10x peak safety. Example numbers:
- Read QPS (avg): 3,858 RPS = 10B/month. Peak assumption 10x => ~38.6k RPS.
- CDN edge hit target: 99% → origin RPS ≈ 386 RPS. Regional caches sized to keep hot working set.
- Cache sizing: to hold top 50M keys at 200 B each → 10 GB; use multiple shards per region.
- Kafka throughput: at peak 40k events/s, with average event size 500 B => ~20 MB/s ingest; with replication factor 3 and overhead plan for ~60 MB/s cluster throughput.
- Primary DB throughput: writes ~40/s, reads from origin after cache misses ~386/s aggregate (or per-region splits). DynamoDB capacity auto-scaling easily supports these levels.

Security and operational features

Rate limiting and abuse detection at edge to mitigate spam or bots.
Throttling of analytics writes during overload, sampling for extremely hot codes.
Access controls (OAuth/API tokens) for creating and managing short URLs.
Audit logs for deletions to meet the non-expiry guarantee unless explicitly deleted.

Conclusion
This design prioritizes redirect latency and availability via CDN edge caching + minimal edge logic, durable multi-region storage for permanent mappings, a simple and robust ID allocation and base62 encoding scheme (guaranteeing no collisions and <=7 chars), and a completely asynchronous analytics pipeline so that analytics never degrade redirect performance. The architecture scales horizontally, survives full DC outages via multi-region replication, and provides operational levers (cache sizing, CDN invalidation, sampling) to balance costs versus performance.

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

Judge Models Google Gemini 2.5 Flash

Total Score

Overall Comments

Answer B delivers an exceptionally detailed and robust system design. It distinguishes itself by incorporating advanced concepts like edge functions for ultra-low latency redirects, which directly addresses the strict p99 latency requirement. The numerical rigor is outstanding, with precise estimates for storage, cache sizing, and Kafka throughput. The URL generation strategy using distributed ID blocks is well-articulated, and the analytics pipeline is meticulously designed for complete decoupling and fault tolerance, even considering event sampling and local buffering. The discussion on reliability is very strong, explicitly mentioning DynamoDB Global Tables and cold-fronting scenarios. The trade-offs are numerous and deeply justified, showcasing a profound understanding of system design principles. The overall clarity and organization, including the initial summary and operational considerations, make it a superior response.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is exceptionally well-designed, introducing edge functions (CloudFront Function/Lambda@Edge) for critical latency requirements, which is a significant advantage. The ID allocation service is well-integrated, and the overall component interaction is described with greater precision and foresight for extreme scale.

Completeness

Weight 20%

All nine required sections (A-I) are explicitly addressed. Additionally, Answer B includes a concise summary, operational considerations, and security features, which adds to its completeness and demonstrates a more holistic design perspective.

Trade-off Reasoning

Weight 20%

Four significant trade-offs are identified and justified with excellent reasoning, demonstrating a deeper understanding of the implications of each choice. The justifications are more detailed and directly link to the system's requirements and constraints.

Scalability & Reliability

Weight 20%

This answer excels in scalability and reliability. It provides highly specific details on multi-region active-active deployment, DynamoDB Global Tables, fault tolerance for the ID allocator, and considerations for regional cache warming during failover. The numerical estimates for Kafka throughput further strengthen this section.

Clarity

Weight 10%

The answer is exceptionally clear and well-organized. The initial summary provides an excellent overview, and each section is meticulously detailed with bullet points and specific examples. The language is precise and easy to understand, even for complex concepts.

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer B is a more comprehensive and rigorous response. It explicitly addresses all nine sections plus adds operational considerations, security, and a conclusion. The numerical rigor is stronger: QPS derivations (100M/month → ~38.6 RPS avg, 10x peak → ~38.6k RPS), cache sizing (50M keys × 200B = 10GB), Kafka throughput estimates (40k events/s × 500B = 20MB/s), and storage projections (20GB/month, 2.4TB over 10 years) are all clearly derived. The CDN-first edge caching strategy is the correct primary mechanism for achieving <10ms p99 at global scale, which Answer A underemphasizes. The trade-off section includes four well-justified trade-offs with concrete reasoning. The write path, reliability, and analytics sections are more detailed. The ID allocation block approach is well-explained. Minor weaknesses: the response is quite long and could be more concise, and some sections (security, operational considerations) go beyond what was asked, though they add value.

View Score Details ▼

Architecture Quality

Weight 30%

Answer B presents a more complete architecture with CDN as the primary optimization layer, edge functions for cache-miss handling, regional Redis caches as secondary, and DynamoDB Global Tables as the durable store. The interaction between components is clearly described. The CDN-first approach is architecturally correct for the latency requirement. The ID allocation block approach is well-integrated into the write path. The analytics pipeline decoupling is thorough.

Completeness

Weight 20%

Answer B addresses all nine required sections and adds operational considerations, security notes, and a conclusion. QPS estimates are derived step-by-step (100M/month → 38.6 RPS avg, 10x peak), storage is calculated (20GB/month, 2.4TB over 10 years), cache sizing is justified (10GB for 50M hot keys), and Kafka throughput is estimated. This level of numerical completeness significantly exceeds Answer A.

Trade-off Reasoning

Weight 20%

Answer B identifies four trade-offs with more concrete justifications: CDN edge caching vs. consistency for deletes/updates (with specific mitigation strategies), sequential ID vs. hashing (with explicit comparison of pros/cons), DynamoDB vs. Cassandra (with operational reasoning), and synchronous write-through vs. eventual cache propagation (with QPS justification). The reasoning is more specific and actionable.

Scalability & Reliability

Weight 20%

Answer B covers multi-active multi-region deployment, DynamoDB Global Tables active-active replication, cache warm-up considerations during failover, ID allocator fault tolerance with large block allocation, Kafka replication factor 3, and point-in-time recovery. The discussion of cache cold-start after failover is a concrete operational insight. The scaling section includes specific numbers (100 Kafka partitions, Redis shard sizing) that make it more actionable.

Clarity

Weight 10%

Answer B is comprehensive but somewhat verbose. The additional sections (operational considerations, security, conclusion) add value but also increase length. The core sections are well-organized and the numbered lists within sections aid readability. However, the sheer volume of content makes it slightly harder to navigate compared to Answer A's more focused structure.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer B is comprehensive, technically specific, and stronger on end-to-end coherence. It explicitly addresses all required sections, provides concrete technologies, and includes more detailed quantitative estimates for QPS, storage, cache sizing, code-space longevity, and analytics throughput. The design is logically connected from CDN edge through caches, database, range-based ID allocation, and async analytics. It also discusses conditional inserts, cache invalidation, partitioning, message-bus sizing, and multi-region failover in a more operationally realistic way. Its trade-off section is clearer and more nuanced. Minor weaknesses include occasional over-optimism around CDN edge behavior and some added assumptions not strictly necessary, but overall it is a stronger benchmark-quality system design.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is highly coherent and traces the system from edge to origin to analytics with strong separation of concerns. It clearly describes request interactions, cache hierarchy, write flow, invalidation, and dashboard isolation, making the design feel more production-ready.

Completeness

Weight 20%

It covers all required sections thoroughly and adds useful operational considerations such as monitoring, invalidation, partitioning, and capacity planning. It explicitly addresses numerical estimates, storage, code generation, redirect path, writes, failover, analytics, and trade-offs with strong coverage.

Trade-off Reasoning

Weight 20%

It presents multiple concrete trade-offs and explains why each choice fits the workload. The discussion is more nuanced, especially around edge caching versus consistency, allocator complexity versus collision handling, managed versus self-hosted storage, and synchronous cache population versus eventual propagation.

Scalability & Reliability

Weight 20%

The design handles scalability and resilience with more concrete mechanisms: multi-active regions, global tables or Cassandra replication, range allocation to avoid central write bottlenecks, sharded caches, message-bus partitioning, backup strategy, and failover behavior. It better explains how the system continues operating during a full data center outage.

Clarity

Weight 10%

The answer is very well organized, with explicit sections, bullet points, and logical progression. It balances readability with specificity and makes the reasoning behind design choices easier to follow.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Google Gemini 2.5 Pro

Winning Votes

0 / 3

Average Score

View this answer

Winner OpenAI GPT-5 mini

Winning Votes

3 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models OpenAI GPT-5.4

Gemini 2.5 Pro 74

GPT-5 mini Winner 88

Why This Side Won

Answer B wins because it is more complete in practice, more numerically rigorous, and more concrete about how the system achieves the latency, scale, and availability requirements. It gives a clearer read path with CDN plus regional cache behavior, a better-specified distributed ID allocation strategy, stronger operational estimates, and more detailed reliability and analytics-decoupling mechanisms. Answer A is good but remains more generic and less deeply justified in several core areas.

Judge Models Anthropic Claude Sonnet 4.6

Gemini 2.5 Pro 72

GPT-5 mini Winner 83

Why This Side Won

Answer B wins on numerical rigor, architecture depth, and completeness. It provides concrete QPS derivations, storage estimates, cache sizing calculations, and Kafka throughput numbers that Answer A largely omits or handles superficially. Answer B's CDN-first edge caching strategy is the correct primary mechanism for achieving sub-10ms p99 redirects at global scale — a critical insight that Answer A underemphasizes. Answer B also provides four well-justified trade-offs versus three in Answer A, and its analytics pipeline design is more detailed. While both answers are competent, Answer B demonstrates stronger engineering judgment and quantitative rigor across all evaluated dimensions.

Judge Models Google Gemini 2.5 Flash

Gemini 2.5 Pro 77

GPT-5 mini Winner 84

Why This Side Won

Answer B is superior due to its significantly higher level of detail, advanced architectural considerations (like edge functions), and superior numerical rigor. It provides more specific technology choices and deeper justifications for design decisions, particularly in read path optimization, analytics separation, and reliability. The trade-off analysis is also more comprehensive and insightful. While Answer A is a strong baseline, Answer B demonstrates a more expert-level understanding and implementation of a highly scalable and performant system.

Design a URL Shortening Service at Scale

Task Overview

Task Prompt

Answers

Answer A: Google Gemini 2.5 Pro

A. High-level Architecture

A. High-level Architecture

B. URL Generation Strategy

C. Data Model and Storage

D. Read Path Optimization

E. Write Path

F. Scaling Strategy

G. Reliability and Fault Tolerance

H. Analytics Pipeline

I. Key Trade-offs

Answer B: OpenAI GPT-5 mini

Comparison Summary

Judging Results

Related Tasks

Creative Uses for Retired Shipping Containers

Helping a Friend Navigate Guilt After Setting Boundaries with Family

Victorian-Era Botanist Advises on Houseplant Care

Explain the Paradox of the Second Law of Thermodynamics and Biological Evolution

Explain Database Indexing to a Junior Developer

Evaluating Transportation Options for a Mid-Size City

Write a Humorous Office Email Chain About a Missing Stapler

Plan a Three-Day Community Science Fair for a Small Town

Related Links