Design a URL Shortening Service

Compare model answers for this System Design benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

System Design

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

Anthropic Claude Opus 4.6

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A OpenAI GPT-5.2

Answer B Google Gemini 2.5 Flash

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.4 Anthropic Claude Opus 4.6 Google Gemini 2.5 Pro

Task Prompt

Show more ▼

Design a URL shortening service (similar to bit.ly or tinyurl.com) that must handle the following constraints: 1. The service must support 100 million new URL shortenings per month. 2. The ratio of read (redirect) requests to write (shorten) requests is 100:1. 3. Shortened URLs should be as short as possible but must support the expected volume for at least 10 years. 4. The system must achieve 99.9% uptime availability. 5. Redirect latency must be under 50ms at the 95th percentile. 6. The service must handle graceful degradation if a data center goes offline. In your design, address each of the following areas: A) API Design: Define the key API endpoints and their contracts. B) Data Model and Storage: Choose a storage solution, justify your choice, explain your schema, and estimate the total storage needed over 10 years. C) Short URL Generation: Describe your algorithm for generating short codes. Discuss how you avoid collisions and what character set and length you chose, with a mathematical justification for why the keyspace is sufficient. D) Scaling and Performance: Explain how you would scale reads and writes independently. Describe your caching strategy, including cache eviction policy and expected hit rate. Explain how you meet the 50ms p95 latency requirement. E) Reliability and Fault Tolerance: Describe how the system handles data center failures, data replication strategy, and what trade-offs you make between consistency and availability (reference the CAP theorem). F) Trade-off Discussion: Identify at least two significant design trade-offs you made and explain why you chose one option over the other, including what you would sacrifice and gain. Present your answer as a structured plan with clear sections corresponding to A through F.

Judging Policy

Show more ▼

A strong answer should be a well-structured plan covering all six sections (A through F). Evaluate based on the following criteria: 1. Completeness: All six areas must be addressed with substantive detail, not just surface-level mentions. 2. API Design: Should include at least a POST endpoint for creating short URLs and a GET endpoint for redirection, with reasonable input/output contracts. 3. Storage and Data Model: Should include a justified choice of database (e.g., NoSQL for write throughput or relational for consistency), a clear schema, and a reasonable storage estimate. The estimate should be grounded in the given constraints (100M writes/month over 10 years). 4. Short URL Generation: Must include a concrete algorithm (e.g., base62 encoding, counter-based, hash-based, or pre-generated key service). The answer should mathematically justify the keyspace (e.g., base62 with 7 characters yields ~3.5 trillion combinations, which exceeds 12 billion URLs over 10 years). Collision handling must be addressed. 5. Scaling and Performance: Should describe horizontal scaling, read replicas or partitioning, and a caching layer (e.g., Redis or Memcached) with a reasonable eviction policy. The answer should connect the caching strategy to the 50ms latency target and the 100:1 read-to-write ratio. 6. Reliability: Should discuss multi-datacenter replication, failover mechanisms, and explicitly reference CAP theorem trade-offs (e.g., choosing AP over CP for availability during partitions). 7. Trade-offs: Must identify at least two genuine trade-offs with clear reasoning about what is gained and lost. Examples include consistency vs. availability, simplicity vs. scalability, or storage cost vs. computation cost. Penalize answers that are vague, miss entire sections, provide no quantitative reasoning, or fail to connect design choices to the stated constraints. Reward answers that demonstrate depth of reasoning, realistic numbers, and awareness of practical engineering concerns.

Answers

Answer A: OpenAI GPT-5.2

A) API Design

Create short URL

Endpoint: POST /v1/urls
Request JSON:
- long_url (string, required, max e.g. 8KB)
- custom_alias (string, optional; if provided must be unique and pass allowed charset/length)
- expire_at (timestamp, optional)
- metadata (object, optional: e.g., campaign, user_id)
Response (201):
- code (string)
- short_url (string)
- long_url (string)
- created_at (timestamp)
- expire_at (timestamp|null)
Errors:
- 400 invalid URL/alias
- 409 alias already exists
- 429 rate limited

Resolve/redirect

Endpoint: GET /{code}
Behavior:
- 301 Moved Permanently...

Show Full Answer ▼

A) API Design

Create short URL

Endpoint: POST /v1/urls
Request JSON:
- long_url (string, required, max e.g. 8KB)
- custom_alias (string, optional; if provided must be unique and pass allowed charset/length)
- expire_at (timestamp, optional)
- metadata (object, optional: e.g., campaign, user_id)
Response (201):
- code (string)
- short_url (string)
- long_url (string)
- created_at (timestamp)
- expire_at (timestamp|null)
Errors:
- 400 invalid URL/alias
- 409 alias already exists
- 429 rate limited

Resolve/redirect

Endpoint: GET /{code}
Behavior:
- 301 Moved Permanently by default (or 302 if tracking/experimentation requires)
- Location header = long_url
Optional: GET /v1/urls/{code} returns JSON instead of redirect (for API clients)
Errors:
- 404 not found/expired

Delete/disable (optional, for abuse/user control)

Endpoint: DELETE /v1/urls/{code}
Auth required
Response: 204

Analytics (optional, can be async/approx)

Endpoint: GET /v1/urls/{code}/stats?from=&to=
Returns aggregated counters (clicks by day, referrer, geo, etc.)

Notes:

Auth: API keys/OAuth for creation and management; redirects remain public.
Idempotency: support Idempotency-Key header on POST to avoid duplicates on retries.

B) Data Model and Storage

Workload

Writes: 100M new/month ≈ 3.33M/day ≈ 38.6 writes/sec average (higher peak).
Reads: 100:1 => 10B redirects/month ≈ 3.86K reads/sec average (peaks much higher).

Storage choice

Primary store: distributed key-value / wide-column store (e.g., DynamoDB/Cassandra/ScyllaDB) optimized for very high read QPS, low latency, horizontal scaling, and multi-DC replication.
Rationale: access pattern is primarily key lookup by short code; value is small; strong need for high availability and predictable latency.
Secondary systems:
- Cache: Redis/Memcached clusters per region.
- Analytics pipeline: Kafka/PubSub + stream processing + OLAP store (not on critical path).

Schema

url_mapping (primary)

Partition key: code (string)
Columns:
- long_url (string)
- created_at (timestamp)
- expire_at (timestamp|null)
- user_id (string|null)
- status (active/deleted)
- checksum (optional for integrity)
TTL: if expire_at set, use native TTL so expired entries auto-remove.

alias_reservation (optional if supporting custom aliases with uniqueness)

key: custom_alias
value: code / ownership metadata

idempotency (optional)

key: (user_id, idempotency_key)
value: code
TTL: e.g., 24h

Storage estimate (10 years)

New URLs in 10 years: 100M/month * 12 * 10 = 12B mappings.
Per record size rough order:
- code: ~8 bytes (or up to ~10 chars)
- long_url: assume avg 200 bytes (conservative), plus overhead
- metadata/columns/index overhead: assume ~100–200 bytes depending on store
- Total effective (including replication overhead not counted yet): ~400 bytes/record (reasonable planning figure).
Raw data: 12B * 400B = 4.8TB.
With storage engine overhead + compaction + secondary indexes: plan ~2–3x => ~10–15TB.
Replication:
- Within region (RF=3): ~30–45TB.
- Multi-region (e.g., 2 regions active-active): double => ~60–90TB total across all regions.
  (These are planning numbers; exact depends on average URL length and DB overhead.)

C) Short URL Generation

Goals

Short as possible
Collision-free at high scale
Supports at least 10 years of growth

Keyspace math

Total codes needed in 10 years: 12B.
Use Base62 charset: [0-9][a-z][A-Z] => 62 symbols.
Length L capacity: 62^L.
- 62^6 ≈ 56.8B (sufficient for 12B)
- 62^7 ≈ 3.52T (more headroom)

Choice

Use variable length with a floor of 6 characters.
Start at 6 chars for first ~56.8B codes; that already covers 10-year requirement with significant margin (56.8B/12B ≈ 4.7x).
Keep ability to move to 7 chars in the future without breaking existing links.

Generation algorithm (no collisions)
Option selected: globally unique numeric IDs + Base62 encoding.

Maintain a monotonically increasing 64-bit ID space.
Encode ID in Base62 to produce the code.
Collision avoidance: none needed because IDs are unique by construction.

How to generate IDs at scale

Use a distributed ID allocator similar to Snowflake, or a central “ID service” with range leasing:
1. ID service stores a counter in strongly consistent store (e.g., etcd/Spanner) per region.
2. Each app server leases an ID block (e.g., 1M IDs) to generate locally.
3. When block exhausted, request a new block.
This avoids per-write contention while ensuring uniqueness.

Handling custom aliases

If custom_alias provided:
- Validate charset/length
- Conditional write (compare-and-set) on url_mapping with key=alias; if exists return 409.

D) Scaling and Performance

High-level architecture

Anycast/GeoDNS -> Global Load Balancer -> Regional L7 load balancers -> stateless API/redirect servers.
Separate services:
- Write path: URL creation service
- Read path: Redirect service
- Shared storage + cache

Scaling reads and writes independently

Redirect service scaled horizontally based on QPS/latency (likely dominant cost).
Create service scaled based on write throughput.
Use separate autoscaling groups and independent deployment pipelines.

Data partitioning

Partition by code hash (or by code itself) across DB nodes.
Even distribution because codes are effectively uniform over ID space when Base62-encoded.

Caching strategy

Primary: CDN/edge caching for redirects + regional in-memory cache.
1. CDN:
  - Cache 301/302 responses keyed by path (code) with TTL (e.g., 5–60 minutes, or respect expire_at).
  - For extremely hot links, CDN can absorb most traffic globally.
2. Regional cache (Redis/Memcached):
  - Key: code
  - Value: long_url + expire_at + status
  - TTL: min(default_ttl, expire_at-now)
  - Eviction: LRU or LFU (prefer LFU for skewed popularity).

Expected hit rate

URL accesses are typically highly skewed (Zipf). With CDN + regional cache, 95–99% hit rate is feasible for redirects; even 90% is helpful.
Cache warming: on write, push mapping to cache; also negative caching for misses (short TTL) to reduce repeated DB hits for invalid codes.

Meeting <50ms p95 redirect latency

Critical path for cached hit:
- Edge/CDN hit: often <20ms.
- Regional cache hit: LB + app + Redis lookup + response: typically 5–15ms within region.
For cache miss:
- Single DB read from local region replica: target 5–20ms.
Keep redirect servers close to users via multi-region + edge.
Use keep-alive, HTTP/2 between components, connection pooling to Redis/DB.
Avoid synchronous analytics; emit click events asynchronously to a queue.

Write performance

Writes are low compared to reads; still handle bursts.
Write flow:
1. Generate ID/code
2. Write to DB (quorum/majority within region)
3. Write-through to cache
4. Return response
Optional: dedupe identical long_url by storing a hash->code index (trade-off; see F).

E) Reliability and Fault Tolerance

Availability target: 99.9%

Multi-AZ within each region for all stateful components.
At least two regions active-active for redirect traffic.

Replication strategy

Within region: replication factor 3 across AZs.
Cross-region:
- Asynchronous replication for url_mapping to keep write latency low and availability high.
- Redirects served from local region; if local mapping missing due to replication lag, fallback to another region (see below).

Data center/region failure handling (graceful degradation)

Use Global Load Balancer health checks:
- If a region is unhealthy, route traffic to next closest healthy region.
For redirect requests:
- If regional cache/DB is degraded, redirect service can:
  1. Try cache
  2. Try local DB
  3. If local DB unavailable, query a remote region read endpoint (with strict timeout) or use cross-region read replica.
  4. If all fail, return a fast 503 with Retry-After (graceful failure).
For write requests:
- Prefer local region writes; if region down, failover to another region.
- ID allocation: each region has its own ID block namespace (or uses region bits in Snowflake) to avoid conflicts during failover.

CAP theorem trade-off

Choose Availability over strong Consistency for global operations:
- Redirects must be highly available; stale reads are acceptable briefly.
- Cross-region replication is async: a newly created short URL might not resolve in a different region for a short window.
  Mitigations:
- After creation, return a short URL hosted on same region domain (or use edge that routes to origin region for the first few minutes).
- On create, optionally do a synchronous write to two regions for premium/enterprise tier (higher latency/cost).

Backups and disaster recovery

Periodic snapshots + incremental backups to object storage.
Regular restore drills.

F) Trade-off Discussion

ID-based code generation vs random tokens

Chosen: ID-based (range leased/Snowflake) + Base62.
Gains:
- No collisions, no need for retry loops.
- Predictable capacity planning and simpler DB constraints.
- Faster writes.
Sacrifices:
- Codes are guessable/sequential (security through obscurity not provided).
  Mitigation:
- Add optional “salted permutation” (e.g., Feistel network) over the numeric ID before Base62 encoding to make codes non-sequential while preserving uniqueness.

Async cross-region replication (AP) vs synchronous multi-region (CP)

Chosen: async cross-region replication.
Gains:
- Lower write latency, higher availability during partial failures.
- Better ability to continue serving redirects even if a region is impaired.
Sacrifices:
- Eventual consistency: brief window where a newly created URL may not be readable in another region.
  Mitigation:
- Read-repair/fallback to origin region for fresh links; optional dual-write for high-SLA customers.

Heavy caching/CDN vs always hitting DB

Chosen: CDN + regional cache.
Gains:
- Easily meets <50ms p95 and reduces DB load drastically.
- Handles traffic spikes for viral links.
Sacrifices:
- Cache invalidation complexity (e.g., deleted/expired links).
  Mitigation:
- Short TTLs, respect expire_at, and explicit purge on delete for CDN where supported; use status flags and negative caching carefully.

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

Judge Models Anthropic Claude Opus 4.6

Total Score

Overall Comments

Answer A is a comprehensive, deeply detailed system design that covers all six sections with substantial depth. It provides precise quantitative reasoning throughout (QPS calculations, storage estimates with replication factors, cache hit rates), addresses edge cases (idempotency, negative caching, abuse handling), and offers practical engineering details like connection pooling, HTTP/2, Feistel network for code obfuscation, and write-through caching. The API design includes rate limiting, idempotency headers, and analytics endpoints. The storage estimation accounts for replication across regions. The trade-off section identifies three genuine trade-offs with clear mitigations. The reliability section provides a detailed fallback chain for degraded scenarios. Overall, it demonstrates senior-level engineering depth and practical awareness.

View Score Details ▼

Architecture Quality

Weight 30%

Answer A provides a thorough architecture with explicit QPS calculations (38.6 writes/sec, 3.86K reads/sec), multi-layer caching (CDN + regional Redis/Memcached), detailed write and read paths, connection pooling, HTTP/2 optimization, async analytics via Kafka, and a well-designed ID allocation system with range leasing. The architecture choices are well-justified and connected to the constraints.

Completeness

Weight 20%

Answer A covers all six sections with substantial detail. It includes additional endpoints (DELETE, analytics), idempotency support, negative caching, alias reservation schema, detailed storage estimates with replication factors across regions (60-90TB total), and three trade-offs with mitigations. The storage estimate realistically accounts for overhead and multi-region replication.

Trade-off Reasoning

Weight 20%

Answer A identifies three genuine trade-offs: ID-based vs random tokens, async vs sync cross-region replication, and heavy caching vs always hitting DB. Each trade-off includes clear gains, sacrifices, and practical mitigations (Feistel network for code obfuscation, read-repair for replication lag, short TTLs for cache invalidation). The reasoning demonstrates deep understanding of engineering trade-offs.

Scalability & Reliability

Weight 20%

Answer A provides a detailed fallback chain for degraded scenarios (cache -> local DB -> remote region -> 503 with Retry-After), region-aware ID allocation to avoid conflicts during failover, explicit latency breakdown for cached and uncached paths, and discusses both within-region (RF=3) and cross-region replication. The connection between caching strategy and the 50ms p95 target is explicit and convincing.

Clarity

Weight 10%

Answer A is well-organized with clear section headers, sub-sections, and bullet points. The use of notes, numbered lists, and explicit labels makes it easy to follow. The quantitative reasoning is clearly presented. Some sections are dense but remain readable.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer A provides an outstanding, expert-level system design. It demonstrates a deep understanding of distributed systems principles by selecting a highly appropriate tech stack (distributed K-V store), providing a detailed and realistic storage calculation, and outlining a sophisticated scaling and reliability strategy that includes CDN/edge caching, multi-region active-active deployment, and a clear graceful degradation path. The API design is comprehensive, and the trade-off discussion is nuanced and insightful. The level of detail across all sections is exceptional.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is exceptionally well-suited for the problem. The choice of a distributed key-value store like DynamoDB/Cassandra is ideal for the high-read, key-lookup workload. The overall design, from Anycast/GeoDNS down to the separated read/write services, is robust, scalable, and demonstrates expert-level thinking.

Completeness

Weight 20%

The answer is extremely complete, addressing all six sections in great detail. The API design is particularly thorough, including optional but important endpoints for analytics and deletion, as well as considerations for idempotency and authentication.

Trade-off Reasoning

Weight 20%

The answer provides three distinct and highly relevant trade-offs. The reasoning is excellent, clearly outlining what is gained and sacrificed, and importantly, includes practical mitigations for the downsides of each choice. This demonstrates a mature understanding of engineering trade-offs.

Scalability & Reliability

Weight 20%

This is an outstanding section. The plan for scaling reads and writes independently is clear. The caching strategy is multi-layered (CDN + regional cache), which is critical for meeting the latency target at a global scale. The reliability plan is also excellent, with a detailed graceful degradation path and a clear, well-mitigated AP choice in the CAP theorem.

Clarity

Weight 10%

The answer is exceptionally clear and well-structured. It follows the requested A-F format perfectly, using headings, sub-bullets, and concise language to present complex ideas in an easy-to-digest manner.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer A is a strong, well-structured system design that covers all required sections A through F with concrete APIs, a realistic data model, quantitative capacity estimates, and a practical architecture for caching, multi-region failover, and latency control. It ties choices directly to the workload, includes operational details like idempotency, negative caching, CDN behavior, quorum/range-leased ID generation, and graceful degradation paths. Minor weaknesses are that some estimates are still coarse and a few implementation choices are presented as options rather than a single sharply committed design.

View Score Details ▼

Architecture Quality

Weight 30%

Uses a coherent end-to-end architecture matched to the workload: stateless read/write services, distributed KV store, regional caches plus CDN, async analytics, multi-region routing, and a concrete ID-generation strategy. The design shows good separation of concerns and practical implementation details.

Completeness

Weight 20%

Addresses all six required sections substantively and adds useful details such as auth, idempotency, optional analytics, alias reservation, backup strategy, and graceful-degradation steps. Quantitative reasoning is included in multiple sections.

Trade-off Reasoning

Weight 20%

Identifies multiple meaningful trade-offs with explicit gains, sacrifices, and mitigations, including ID-based generation vs random tokens, async replication vs synchronous replication, and caching complexity vs latency. The reasoning is tied to the workload and SLA requirements.

Scalability & Reliability

Weight 20%

Strong treatment of horizontal scaling, partitioning, read/write separation, expected cache hit rates, CDN support, p95 latency path, RF=3, multi-AZ and active-active regional setup, async cross-region replication, and fallback behavior during regional outages. Explicitly addresses graceful degradation with concrete request flow.

Clarity

Weight 10%

Very readable, logically organized by A-F, and easy to follow. The use of bullets, formulas, and stepwise flows helps comprehension, though the answer is somewhat dense.

Answer B: Google Gemini 2.5 Flash

A) API Design:

Shorten URL Endpoint:

Method: POST
Path: /shorten

Request Body (JSON):

{
  "long_url": "string",
  "custom_short_code": "string" (optional),
  "expires_at": "ISO 8601 datetime string" (optional)
}

Response Body (JSON - 201 Created):

{
  "short_url": "string",
  "long_url": "string",
  "custom_short_code": "string" (if provided),
  "expires_at": "ISO 8601 datetime string" (if provided)
}

**...

Show Full Answer ▼

A) API Design:

Shorten URL Endpoint:

Method: POST
Path: /shorten

Request Body (JSON):

{
  "long_url": "string",
  "custom_short_code": "string" (optional),
  "expires_at": "ISO 8601 datetime string" (optional)
}

Response Body (JSON - 201 Created):

{
  "short_url": "string",
  "long_url": "string",
  "custom_short_code": "string" (if provided),
  "expires_at": "ISO 8601 datetime string" (if provided)
}

Error Responses:
- 400 Bad Request: Invalid long_url format or missing.
- 409 Conflict: custom_short_code already exists.
- 500 Internal Server Error: General server error.

Redirect Short URL Endpoint:
- Method: GET
- Path: /{shortCode}
- Response: 302 Found HTTP redirect to the long_url.
- Error Responses:
  - 404 Not Found: shortCode does not exist or has expired.
  - 500 Internal Server Error: General server error.

B) Data Model and Storage:

Storage Choice: A sharded relational database (e.g., PostgreSQL) with read replicas.

Justification:

Strong Consistency for Writes: Essential for ensuring the uniqueness of short codes. Relational databases provide ACID properties, which are crucial for reliable data integrity during URL creation.
Well-understood Sharding: PostgreSQL supports various sharding strategies (e.g., range-based, hash-based) to distribute write load and scale horizontally.
Efficient Indexing: B-tree indexes are highly efficient for point lookups on short_code, which is the primary operation for redirects.
Read Replicas: Easily supports scaling read operations (redirects) by adding multiple read-only replicas.
Maturity and Ecosystem: Robust tools for monitoring, backup, and recovery.

Schema:

urls Table:

id: BIGINT, Primary Key, Auto-incrementing (or generated by a distributed ID service). Used as the basis for the short code.
short_code: VARCHAR(7), Unique Index, NOT NULL. The actual short string used in the URL.
long_url: TEXT, NOT NULL. The original long URL.
created_at: TIMESTAMP WITH TIME ZONE, NOT NULL, Default: NOW().
expires_at: TIMESTAMP WITH TIME ZONE (nullable). When the short URL should expire.
user_id: BIGINT (nullable, foreign key to a users table if user accounts are supported).

Storage Estimation (over 10 years):

Volume: 100 million new URLs/month * 12 months/year * 10 years = 12 billion URLs.
Record Size Estimation:
- id: 8 bytes (BIGINT)
- short_code: 7 bytes (VARCHAR(7))
- long_url: Average 100 bytes (TEXT, variable length)
- created_at: 8 bytes (TIMESTAMP)
- expires_at: 8 bytes (TIMESTAMP)
- Total per record: ~131 bytes.
Raw Data Size: 12 billion records * 131 bytes/record = 1.572 TB.
With Indexes and Overhead: Accounting for indexes (especially on short_code and id), replication, and database overhead, the total storage could be approximately 3-5 TB over 10 years. This is manageable with a sharded database architecture.

C) Short URL Generation:

Algorithm:

When a new URL shortening request arrives, a unique, monotonically increasing ID is obtained. This ID can be generated by:
- Database Auto-increment: If using a sharded database, each shard can be configured with a unique ID range (e.g., shard 1 uses IDs 1, 11, 21..., shard 2 uses 2, 12, 22...).
- Distributed ID Generator: A dedicated service like Snowflake (or a similar custom implementation) can generate globally unique, time-ordered 64-bit integers.
This unique ID is then converted to a base62 string. Base62 uses 62 characters (0-9, a-z, A-Z), which allows for a compact representation.

Collision Avoidance:

By deriving the short code from a globally unique ID (either via sharded auto-increment or a distributed ID generator), collisions are inherently avoided. Each generated ID is unique, and its base62 representation will also be unique.
For custom_short_code requests, the system attempts to insert the user-provided code. If a conflict occurs (due to an existing entry), a 409 Conflict error is returned to the user.

Character Set and Length:

Character Set: Base62 (0-9, a-z, A-Z) - 62 unique characters.
Length: 6 characters initially.

Mathematical Justification for Keyspace Sufficiency:

Required Volume: 12 billion unique URLs over 10 years.
Keyspace with 6 characters: 62^6 = 56,800,235,584 (approximately 56.8 billion unique codes).
This keyspace (56.8 billion) is significantly larger than the required 12 billion URLs, providing ample room for growth and ensuring sufficiency for at least 10 years. If the service were to exceed this, the short code length could be extended to 7 characters, providing 62^7 (approximately 3.5 trillion) unique codes.

D) Scaling and Performance:

Scaling Reads (Redirects):

Distributed Cache (e.g., Redis Cluster): The primary scaling mechanism for reads. short_code -> long_url mappings are stored in a highly available, distributed cache. Given the 100:1 read/write ratio, most redirect requests will be served directly from the cache.
Database Read Replicas: The sharded PostgreSQL instances will have multiple read replicas. In case of a cache miss, the request falls back to a read replica. Load balancers distribute queries across these replicas.
Content Delivery Network (CDN): While not directly caching the 302 redirect, a CDN can serve static assets and potentially optimize routing for initial requests, reducing latency for geographically dispersed users.

Scaling Writes (Shortenings):

Database Sharding: The core strategy. Data is partitioned across multiple PostgreSQL instances based on a sharding key (e.g., the id from which the short_code is derived). This distributes the write load and storage requirements.
Connection Pooling: Efficiently manages database connections to reduce overhead.
Asynchronous Processing: For non-critical tasks (e.g., analytics, logging, updating statistics), message queues (e.g., Kafka, RabbitMQ) can offload work from the primary write path, improving responsiveness.

Caching Strategy:

Cache Type: Distributed in-memory cache (e.g., Redis Cluster).
Data Cached: short_code to long_url mappings.
Cache Population: On a successful URL shortening, the new mapping is written to both the database and the cache. On a cache miss during a redirect, the mapping is fetched from the database and then written to the cache (write-through/write-aside).
Eviction Policy: LRU (Least Recently Used) is suitable. Popular URLs will remain in the cache, while less frequently accessed ones will be evicted to make space. Time-to-Live (TTL) can also be applied, especially for URLs with an expires_at field.
Expected Hit Rate: Given the 100:1 read/write ratio and the power-law distribution of URL access (a small percentage of URLs account for a large percentage of traffic), a cache hit rate of 90-95%+ is highly achievable, significantly reducing database load.

Meeting 50ms p95 Latency Requirement:

High Cache Hit Rate: The vast majority of redirect requests will be served from the distributed cache, which can respond in single-digit milliseconds.
Optimized Database Access: For cache misses, efficient indexing on short_code in the sharded database ensures fast lookups.
Proximity and Network: Deploying services in multiple regions close to users and utilizing high-speed, low-latency network infrastructure.
Load Balancing: Efficiently distributing requests across healthy service instances and database replicas.
Stateless Application Servers: Allows for easy horizontal scaling of the API layer, ensuring sufficient capacity to handle peak loads without becoming a bottleneck.

D) Reliability and Fault Tolerance:

Data Center Failures:

Multi-Region Deployment: The entire service (API servers, cache clusters, database shards) is deployed across at least two geographically distinct data centers/regions.
Global Load Balancer (e.g., DNS-based traffic management): Directs user traffic to the nearest healthy region. If one region fails, traffic is automatically rerouted to the operational region.
Automated Failover: Database clusters are configured for automatic failover within a region (primary to replica) and across regions (active-passive or active-active setup).

Data Replication Strategy:

Database (PostgreSQL):
- Within a Region: Synchronous streaming replication is used between the primary database instance and its local replica(s). This ensures zero data loss in case of a primary failure within the same data center.
- Across Regions: Asynchronous streaming replication is used from the primary in the active region to a replica in the passive/standby region. This provides disaster recovery capabilities with minimal performance impact on the primary write path. In an active-active setup, each region's primary would asynchronously replicate to the other.
Distributed Cache (Redis Cluster): Data is sharded and replicated across multiple nodes within each Redis cluster. Each shard has a primary and at least one replica. In case of a node failure, a replica is promoted to primary.

Trade-offs (CAP Theorem):

Choice: Prioritize Availability (A) and Partition Tolerance (P) over strong Consistency (C) for read operations (redirects), while maintaining strong consistency for write operations (shortening) within a database shard.
Justification: For a URL shortening service, it is more critical that the service remains available and responsive (low latency) even during network partitions or data center failures. Users expect redirects to work quickly. A brief period where a newly created short URL might not be immediately resolvable globally (due to replication lag) is generally acceptable.
How it's achieved:
- Availability: Multi-region deployment, global load balancing, database read replicas, and a highly available distributed cache ensure that requests can always be served.
- Partition Tolerance: The distributed nature of the system (sharding, replication) inherently handles network partitions.
- Consistency: For writes, strong consistency is maintained within a database shard. Cross-region replication is asynchronous, meaning there might be a small window of inconsistency between regions for newly created URLs. The cache also serves as an eventually consistent layer, as it's updated on writes and on cache misses.

F) Trade-off Discussion:

Consistency vs. Availability for Redirects (CAP Theorem):
- Choice: Prioritize Availability and Partition Tolerance over immediate Strong Consistency for the redirect path.
- Why: The primary goal of a URL shortener is to provide fast and reliable redirects. If a newly created URL takes a few seconds to propagate to all read replicas and caches globally, it's a minor inconvenience. However, if the system waits for global consistency before serving a redirect, it would introduce significant latency and reduce availability during network partitions or regional failures. Users would rather experience a rare 404 for a brand new URL than frequent timeouts or slow redirects.
- Sacrifice: A small window of eventual consistency. A user might create a URL in Region A, and if they immediately try to access it from Region B (before asynchronous replication completes), they might temporarily receive a 404. In extreme multi-region failure scenarios with asynchronous replication, there's a small risk of data loss for very recent writes if the primary region fails catastrophically before replication to the secondary region completes.
- Gain: High availability (99.9% uptime) and extremely low latency (sub-50ms p95) for the vast majority of redirect requests, which are the most frequent operations. The system remains operational and responsive even if a data center goes offline.
Short Code Generation Method (Sequential ID to Base62 vs. Hashing Long URL):
- Choice: Generate short codes by converting a unique, sequentially increasing ID (from a distributed ID generator or sharded auto-increment) to Base62.
- Why: This method guarantees uniqueness by design, simplifying the write path and improving performance. It avoids the complexities and performance overhead associated with collision detection and resolution inherent in hash-based approaches.
- Sacrifice: The short codes might appear somewhat sequential (e.g., abcde1, abcde2) rather than completely random. This might be a minor aesthetic concern for some users, but it doesn't impact functionality. Also, it requires a robust distributed ID generation mechanism, which adds a component to the architecture.
- Gain:
  - Guaranteed Uniqueness: No collisions, eliminating the need for database lookups and retries during short code generation, which is critical for high write throughput (100M/month).
  - Simpler Write Path: The process of creating a new short URL is a straightforward insert operation after ID generation.
  - Predictable Keyspace: Easier to manage and ensure the keyspace is sufficient for future growth.
  - Flexibility: Allows the same long_url to be shortened multiple times, yielding different short_codes if desired (e.g., for tracking different campaigns), which is not possible with a simple hash of the long_url.

Result

Winning Votes

0 / 3

Average Score

Judge Models Anthropic Claude Opus 4.6

Total Score

Overall Comments

Answer B is a solid, well-structured system design that covers all required sections adequately. It provides clear explanations and reasonable justifications for design choices. However, it lacks some depth compared to Answer A: the storage estimate uses a low average URL length (100 bytes vs the more realistic 200 bytes), the CDN discussion is weaker (claiming CDN doesn't directly cache 302 redirects, which is inaccurate), the reliability section mislabels itself as section D instead of E, and the trade-off section only identifies two trade-offs (the minimum required) without offering mitigations. The caching strategy discussion is adequate but less detailed. The write QPS and read QPS calculations are not explicitly provided. The fallback chain for data center failures is less detailed than Answer A's.

View Score Details ▼

Architecture Quality

Weight 30%

Answer B presents a reasonable architecture with sharded PostgreSQL, Redis Cluster, and read replicas. However, it lacks explicit QPS calculations, the CDN discussion incorrectly suggests CDN cannot cache 302 redirects, and the architecture is less detailed in terms of optimization techniques. The choice of PostgreSQL over a NoSQL solution for this workload is defensible but less optimal for the access pattern described.

Completeness

Weight 20%

Answer B covers all six sections but with less depth. The storage estimate uses a low average URL length (100 bytes) and arrives at 3-5TB without accounting for multi-region replication. Section E is mislabeled as section D. The API design lacks rate limiting and idempotency. Only two trade-offs are provided (the minimum required) without mitigations.

Trade-off Reasoning

Weight 20%

Answer B identifies two trade-offs: consistency vs availability and sequential ID vs hashing. The reasoning is sound but more surface-level. The first trade-off largely repeats the CAP discussion from section E. No mitigations are offered for the sacrifices identified. The discussion of sequential codes being an aesthetic concern understates the security implications.

Scalability & Reliability

Weight 20%

Answer B discusses multi-region deployment, synchronous intra-region and asynchronous cross-region replication, and automated failover. However, the fallback chain for failures is less detailed, the latency analysis lacks specific numbers for each hop, and the discussion of how exactly 50ms p95 is achieved is more general. The active-passive vs active-active distinction is mentioned but not fully resolved.

Clarity

Weight 10%

Answer B is clearly structured and readable with good use of formatting. However, the mislabeling of section E as section D is a notable organizational error. The use of markdown code blocks for JSON examples adds clarity to the API section. Overall presentation is clean but slightly less detailed in its organization.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer B presents a very strong and competent system design. It covers all the required sections with solid reasoning, proposing a viable architecture based on a sharded relational database. The explanations for scaling, reliability, and trade-offs are clear and correct. However, compared to Answer A, it lacks some of the finer details and advanced considerations, such as a comprehensive CDN strategy, a more detailed API with idempotency, and a more thorough storage estimation that accounts for multi-region replication overhead. While a very good answer, it is slightly less detailed and sophisticated than its counterpart.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture, based on a sharded relational database, is viable and well-justified. However, for this specific workload (100:1 read/write ratio, simple key-value lookups), a NoSQL solution is generally a better fit. The proposed architecture is solid but less optimized for the problem's specific constraints compared to Answer A.

Completeness

Weight 20%

The answer is very complete and addresses all six required sections. The API design covers the core requirements but is less detailed than Answer A, omitting aspects like idempotency, deletion, or analytics endpoints.

Trade-off Reasoning

Weight 20%

The answer provides two well-chosen and clearly explained trade-offs. The reasoning is strong, correctly identifying the gains and sacrifices for choosing availability over consistency and using an ID-based generation method. The quality of the reasoning is high, though Answer A provided an additional relevant trade-off.

Scalability & Reliability

Weight 20%

The discussion on scalability and reliability is strong. It correctly identifies key strategies like read replicas, database sharding, caching, and multi-region deployment. However, it lacks the emphasis on a CDN/edge layer, which is a significant component for a global service with a strict latency requirement. The graceful degradation plan is also less detailed than in Answer A.

Clarity

Weight 10%

The answer is very clear and well-organized, following the prompt's structure. The use of JSON snippets for the API design is helpful. The overall presentation is easy to follow and understand.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer B is solid and generally complete, with clear APIs, reasonable schema, base62 math, caching, sharding, and a CAP discussion. However, it is less rigorous and less tailored to the stated constraints than Answer A. The choice of sharded PostgreSQL is not as convincingly justified for this scale and global availability target, the storage estimate appears optimistic, and the reliability section has some inconsistency and labeling issues. It also provides fewer concrete operational details for degradation, cache invalidation, and cross-region behavior.

View Score Details ▼

Architecture Quality

Weight 30%

Presents a sensible high-level architecture with sharded relational DB, replicas, cache, and multi-region deployment, but the database choice is less persuasive for this access pattern and global scale. Several parts stay at a generic level and are less operationally precise.

Completeness

Weight 20%

Covers all required sections and includes the main expected elements, but some sections are thinner and less specific. Reliability is mislabeled as a second D section, and graceful degradation and failure handling are not explored as concretely as in Answer A.

Trade-off Reasoning

Weight 20%

Includes two legitimate trade-offs with clear pros and cons, especially around CAP and ID-based generation. However, the discussion is narrower and less nuanced, with fewer mitigations and less exploration of downstream operational consequences.

Scalability & Reliability

Weight 20%

Has the main elements for scaling and reliability, including cache, replicas, sharding, and multi-region failover, but is less convincing on cross-region operation and degradation. The PostgreSQL-based design at this scale is less robustly defended, and some statements around active-passive vs active-active replication and availability trade-offs remain vague or internally mixed.

Clarity

Weight 10%

Clear and structured, with straightforward prose and good formatting. It is easy to read, but some sections are repetitive and slightly less precise, and the section-labeling error mildly hurts polish.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Winner OpenAI GPT-5.2

Winning Votes

3 / 3

Average Score

View this answer

Google Gemini 2.5 Flash

Winning Votes

0 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models OpenAI GPT-5.4

GPT-5.2 Winner 88

Gemini 2.5 Flash 77

Why This Side Won

Answer A wins because it scores higher on the most heavily weighted criteria, especially architecture quality and scalability/reliability. It provides a more production-ready design for a read-heavy, globally distributed URL shortener: a better-matched storage choice for key-based access, more detailed multi-region failure handling, clearer latency strategy, and stronger quantitative reasoning around throughput, storage, and caching. Answer B is competent, but its architecture is less robust for the stated constraints and contains weaker operational depth and some reliability ambiguities.

Judge Models Google Gemini 2.5 Pro

GPT-5.2 Winner 91

Gemini 2.5 Flash 81

Why This Side Won

Answer A is the winner due to its superior depth, detail, and the selection of a more fitting architecture for a web-scale, read-heavy service. Answer A's choice of a distributed key-value store is better suited to the problem's constraints than Answer B's relational database. Furthermore, Answer A's scaling strategy is more robust, with a crucial emphasis on CDN/edge caching, which is critical for meeting global low-latency requirements. Its storage estimations are more thorough, and its API design is more comprehensive. While both answers are excellent, Answer A demonstrates a higher level of expertise in designing large-scale distributed systems.

Judge Models Anthropic Claude Opus 4.6

GPT-5.2 Winner 84

Gemini 2.5 Flash 65

Why This Side Won

Answer A wins due to superior depth across all criteria, particularly in architecture quality (detailed multi-layer caching with CDN + regional cache, explicit QPS calculations, connection pooling, async analytics), completeness (additional endpoints, idempotency support, negative caching, Feistel network mitigation), trade-off reasoning (three well-articulated trade-offs with mitigations vs two basic ones), and scalability/reliability (detailed fallback chain, region-aware ID allocation, explicit latency breakdown). The weighted scoring favors Answer A significantly on the highest-weighted criterion (Architecture Quality at 30%) and meaningfully on all other criteria.

Design a URL Shortening Service

Task Overview

Task Prompt

Answers

Answer A: OpenAI GPT-5.2

Answer B: Google Gemini 2.5 Flash

Comparison Summary

Judging Results

Related Tasks

Corporate Jargon Roast: A Satirical Office Memo

The Last Customer at a Closing Bookstore

Implement a Lock-Free Concurrent Skip List with Range Queries

Supporting a Sibling Who Feels Overshadowed at Family Gatherings

Helping a Friend Who Feels Stuck After a Career Change

Implement a Least Recently Used (LRU) Cache

Restructuring a Poorly Written Business Email

Explain Eventual Consistency to Junior Web Developers

Related Links