Design a Global URL Shortening Service

Compare model answers for this System Design benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

System Design

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

OpenAI GPT-5.4

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A Anthropic Claude Opus 4.6

Answer B Google Gemini 2.5 Pro

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.4 Anthropic Claude Sonnet 4.6 Google Gemini 2.5 Flash

Task Prompt

Show more ▼

Design a public URL shortening service similar to Bitly. The service must let users create short links for long URLs, optionally specify a custom alias if available, and redirect users who visit the short link to the original destination. Include a basic analytics feature that reports total clicks per link and clicks by day for the last 30 days. Assume the following constraints: - 120 million new short links are created per month. - 1.2 billion redirect requests are served per month. - Read traffic is highly bursty, especially for viral links. - The service is used globally and users expect low-latency redirects. - Short links should remain valid for at least 5 years. - Redirect availability target is 99.99 percent. - Analytics may be eventually consistent by up to 10 minutes. - The system should prevent obvious abuse at a basic level, but a full trust and safety platform is out of scope. In your design, cover: - High-level architecture and main components. - Data model and storage choices for link mappings and analytics. - ID or token generation strategy, including custom alias handling. - API design for creating links, redirecting, and fetching analytics. - Caching, partitioning, and replication strategy. - Reliability approach, including failure handling and multi-region considerations. - How you would scale for read-heavy traffic and viral hotspots. - Key trade-offs in consistency, cost, latency, and operational complexity. State any reasonable assumptions you make and justify your choices.

Task Context

The answer should be self-contained and should not assume access to any existing company infrastructure. The design can use common cloud primitives and standard distributed systems patterns.

Judging Policy

Show more ▼

A strong answer should present a coherent end-to-end architecture that clearly maps requirements to components and explains why each major design choice fits the workload. It should distinguish hot-path redirect serving from slower analytics processing, choose storage systems that match access patterns, and propose a practical token generation approach that avoids collisions and supports custom aliases. Good answers should address global deployment, caching, sharding or partitioning, replication, failover, and strategies for handling bursty read traffic and hotspot links. They should discuss reliability mechanisms such as redundancy, graceful degradation, backpressure, and eventual consistency for analytics. The best answers acknowledge realistic trade-offs, estimate scale at a rough order-of-magnitude level, and avoid unnecessary complexity while still meeting the stated availability and latency goals.

Answers

Answer A: Anthropic Claude Opus 4.6

Design: Global URL Shortening Service

REQUIREMENTS AND BACK-OF-THE-ENVELOPE ESTIMATES

Functional requirements:

Create a short link for a given long URL (with optional custom alias).
Redirect visitors of the short link to the original URL (HTTP 301/302).
Basic analytics: total clicks per link, clicks by day for the last 30 days.

Non-functional requirements:

120M new links per month, roughly 46 writes per second average, peak around 200 per second.
1.2B redirects per month, roughly 463 per second average, peak around 5,000-50,000 per second for viral links.
99.99% redirect availability.
Global l...

Show Full Answer ▼

Design: Global URL Shortening Service

REQUIREMENTS AND BACK-OF-THE-ENVELOPE ESTIMATES

Functional requirements:

Create a short link for a given long URL (with optional custom alias).
Redirect visitors of the short link to the original URL (HTTP 301/302).
Basic analytics: total clicks per link, clicks by day for the last 30 days.

Non-functional requirements:

120M new links per month, roughly 46 writes per second average, peak around 200 per second.
1.2B redirects per month, roughly 463 per second average, peak around 5,000-50,000 per second for viral links.
99.99% redirect availability.
Global low-latency redirects.
Links valid for at least 5 years.
Analytics eventually consistent within 10 minutes.
Basic abuse prevention.

Storage estimate:

120M links per month times 60 months (5 years) equals 7.2 billion links.
Each link record roughly 500 bytes (short code, long URL, metadata) gives about 3.6 TB of link data over 5 years.
Analytics data is additional but manageable with aggregation.

HIGH-LEVEL ARCHITECTURE

The system is composed of the following major components:

a) API Gateway and Load Balancer: Entry point for all traffic. Handles TLS termination, rate limiting, authentication for link creation, and routing. Deployed in multiple regions behind a global anycast DNS or a global load balancer (e.g., AWS Global Accelerator or Cloudflare).

b) Link Creation Service: Stateless service that handles POST requests to create new short links. Validates input, generates or reserves short codes, checks for custom alias availability, applies basic abuse checks, and writes to the primary database.

c) Redirect Service: Stateless, read-optimized service that handles GET requests for short codes. Looks up the short code in cache first, then database, and returns an HTTP 301 or 302 redirect. Also emits a click event asynchronously for analytics.

d) Analytics Service: Consumes click events from a message queue, aggregates them, and stores daily and total counts. Serves analytics queries.

e) Cache Layer: Distributed cache (Redis or Memcached clusters) deployed in each region to serve hot short codes with sub-millisecond latency.

f) Primary Database: Stores the canonical link mappings. A distributed database like Amazon DynamoDB, Google Cloud Spanner, or CockroachDB.

g) Message Queue: Kafka or Amazon Kinesis for buffering click events between the redirect service and the analytics pipeline.

h) CDN / Edge Layer: For the most popular links, redirect responses can be cached at the CDN edge (using 301 with appropriate cache headers or edge workers that perform the lookup).

Architecture flow:

Link creation: Client -> API Gateway -> Link Creation Service -> Primary DB (write) -> Invalidate/populate cache -> Return short URL.
Redirect: Client -> CDN/Edge -> (cache miss) -> API Gateway -> Redirect Service -> Cache -> (cache miss) -> DB -> Return 302 redirect. Asynchronously emit click event to message queue.
Analytics query: Client -> API Gateway -> Analytics Service -> Analytics DB -> Return results.

DATA MODEL AND STORAGE

Link Mapping Table (Primary Store - DynamoDB or similar):

short_code (partition key): string, 7 characters, e.g., "aB3x9Kz"
long_url: string, the original URL, up to 2048 characters
user_id: string, optional, the creator
custom_alias: boolean, whether this was a custom alias
created_at: timestamp
expires_at: timestamp (created_at + 5 years by default)
click_count: integer (eventually consistent counter, updated periodically)
status: enum (active, disabled, expired)

Why DynamoDB: Single key-value lookup pattern is a perfect fit. It scales horizontally with consistent single-digit millisecond latency. Partition key is the short_code, which distributes well given the random nature of generated codes.

Analytics Store:

Option A: A time-series table in DynamoDB or Cassandra with partition key = short_code and sort key = date (YYYY-MM-DD), with a click_count attribute.
Option B: Pre-aggregated daily counts stored in a separate table, with a TTL of 30 days for the daily granularity rows.

Schema for daily analytics table:

short_code (partition key): string
date (sort key): string, YYYY-MM-DD format
click_count: integer
TTL: timestamp, 30 days from the date

This allows efficient range queries: get all daily counts for a short_code in the last 30 days.

For total click counts, we maintain a running counter in the main link mapping table, updated by the analytics pipeline.

ID / TOKEN GENERATION STRATEGY

Requirements: 7.2 billion unique codes over 5 years. Using base62 encoding (a-z, A-Z, 0-9), a 7-character code gives 62^7 = 3.5 trillion possible combinations, which is more than sufficient.

Approach: Pre-generated ID ranges using a distributed counter or range-based allocation.

Primary strategy:

Use a central ID generation service (like Twitter Snowflake or a simpler counter service) that allocates ranges of numeric IDs to each Link Creation Service instance. For example, each instance requests a block of 10,000 IDs at a time.
Each numeric ID is then encoded to base62 to produce the 7-character short code.
This avoids coordination on every write while ensuring global uniqueness.

Alternative considered: Random generation with collision check. This works but requires a read-before-write to check for collisions, adding latency. With 7.2B codes out of 3.5T possible, collision probability is low (about 0.2%), but it still requires the check. The range-based approach is more deterministic.

Custom alias handling:

When a user requests a custom alias, the service performs a conditional write (PutItem with condition that the short_code does not already exist) to the database.
If the condition fails, the alias is taken, and we return an error to the user.
Custom aliases are validated: minimum 4 characters, maximum 30 characters, alphanumeric plus hyphens, checked against a blocklist of reserved words and offensive terms.
Custom aliases are stored in the same table as generated codes, with the custom_alias flag set to true.

API DESIGN

All APIs are RESTful over HTTPS.

a) Create Short Link:
POST /api/v1/links
Headers: Authorization: Bearer (optional for anonymous, required for analytics access)
Request body:
long_url: required, the destination URL (validated for format and basic safety)
custom_alias: optional, desired short code
expires_in_days: optional, default 1825 (5 years)
Response (201 Created):
short_code: "aB3x9Kz"
short_url: "https://sho.rt/aB3x9Kz"
long_url: "https://example.com/very/long/path"
created_at: "2025-01-15T10:30:00Z"
expires_at: "2030-01-15T10:30:00Z"
Error responses: 400 (invalid URL), 409 (custom alias taken), 429 (rate limited)

b) Redirect:
GET /{short_code}
Response: 302 Found with Location header set to the long URL.
We use 302 (temporary redirect) rather than 301 (permanent redirect) so that browsers do not cache the redirect permanently, allowing us to track clicks and potentially update the destination. However, for performance, we can use 301 at the CDN edge with a controlled TTL.
Error responses: 404 (not found or expired), 410 (disabled)

c) Get Analytics:
GET /api/v1/links/{short_code}/analytics
Headers: Authorization: Bearer
Response (200 OK):
short_code: "aB3x9Kz"
total_clicks: 154302
daily_clicks: list of objects with date and count for the last 30 days
Error responses: 401 (unauthorized), 404 (link not found)

d) Delete / Disable Link:
DELETE /api/v1/links/{short_code}
Headers: Authorization: Bearer
Response: 204 No Content

CACHING, PARTITIONING, AND REPLICATION STRATEGY

Caching:

Layer 1 - CDN Edge Cache: For the redirect path, we can cache 302 responses at the CDN edge with a short TTL (e.g., 5 minutes). This handles viral links extremely well since the CDN absorbs the majority of traffic. We use Cache-Control headers with a short max-age. Edge workers (Cloudflare Workers, Lambda@Edge) can also perform the lookup directly from a regional cache.
Layer 2 - Regional Redis Cluster: Each region has a Redis cluster that caches short_code to long_url mappings. Cache TTL of 24 hours. LRU eviction policy. This handles the vast majority of redirect lookups without hitting the database.
Layer 3 - Application-level local cache: Each redirect service instance maintains a small in-process LRU cache (e.g., 100K entries) for the hottest links.

Cache sizing: With 1.2B redirects per month and a Zipfian distribution, the top 20% of links likely account for 80% of traffic. Caching the top 10M active links in Redis requires roughly 10M times 300 bytes = 3 GB per region, which is very manageable.

Cache invalidation: On link deletion or update, we publish an invalidation event to all regions via the message queue. Cache entries also have TTLs as a safety net.

Partitioning:

DynamoDB automatically partitions by the short_code hash key. The random nature of generated codes ensures even distribution.
For custom aliases, the distribution is less predictable, but DynamoDB adaptive capacity handles hot partitions.
Redis is partitioned using consistent hashing across cluster nodes.

Replication:

DynamoDB Global Tables provide multi-region replication with eventual consistency (typically sub-second). We designate one region as the primary for writes (link creation) and all regions can serve reads.
Alternatively, with CockroachDB or Spanner, we get strongly consistent multi-region reads, but at higher latency cost for writes.
Redis clusters are replicated within each region (primary-replica). Cross-region cache is populated independently via database replication and local cache warming.

RELIABILITY APPROACH

Availability target: 99.99% for redirects means at most 4.3 minutes of downtime per month.

Multi-region deployment:

Deploy the redirect service in at least 3 geographically distributed regions (e.g., US-East, EU-West, AP-Southeast).
Use global DNS-based routing (Route 53 latency-based routing or anycast) to direct users to the nearest region.
Each region is independently capable of serving redirects from its local cache and database replica.

Failure handling:

If the primary database region fails, another region is promoted. With DynamoDB Global Tables, any region can accept writes, so there is no single write leader to fail over.
If Redis in a region fails, the redirect service falls back to the database. The database can handle the load temporarily, and Redis recovers quickly.
If the analytics pipeline (Kafka) has issues, click events are buffered. Kafka's durability ensures no data loss. Analytics being eventually consistent by up to 10 minutes gives us headroom.
Circuit breakers are implemented between services. If the database is slow, the redirect service serves from cache and degrades gracefully (returns cached results or a temporary error for cache misses).

Health checks and monitoring:

Each service instance has health check endpoints.
Load balancers remove unhealthy instances automatically.
Comprehensive monitoring with dashboards for latency percentiles (p50, p95, p99), error rates, cache hit ratios, and queue lag.
Alerting on SLO violations.

Data durability:

DynamoDB provides 99.999999999% durability with cross-region replication.
Regular backups as an additional safety net.

SCALING FOR READ-HEAVY TRAFFIC AND VIRAL HOTSPOTS

The read-to-write ratio is approximately 10:1 (1.2B reads vs 120M writes per month), but during viral events, a single link can receive millions of hits per hour.

Strategies:

CDN edge caching is the first and most effective defense. A viral link's redirect response is cached at hundreds of edge locations worldwide. Even a 5-minute TTL means the origin only sees one request per 5 minutes per edge location.
Edge compute (Cloudflare Workers or Lambda@Edge) can perform the redirect lookup entirely at the edge by reading from a distributed KV store (like Cloudflare KV or DynamoDB DAX), eliminating the need to hit the origin at all.
Redis cluster auto-scaling: Monitor cache load and add read replicas dynamically.
Redirect service auto-scaling: Stateless services scale horizontally based on CPU and request count metrics.
For extreme hotspots, the application-level local cache on each redirect service instance ensures that even if Redis is under pressure, the hottest links are served from memory.

Analytics during viral events:

Click events are produced to Kafka, which handles bursty writes well.
The analytics consumer can batch and aggregate before writing to the analytics store, reducing write amplification.
We use approximate counting if needed (HyperLogLog for unique visitors), but for total clicks, simple counters suffice.

ABUSE PREVENTION

Basic measures (full trust and safety is out of scope):

Rate limiting on link creation: per IP and per authenticated user (e.g., 100 links per hour for anonymous, 1000 for authenticated).
URL validation: reject malformed URLs, check against known phishing/malware URL blocklists (e.g., Google Safe Browsing API).
Custom alias validation: blocklist of offensive and reserved words.
CAPTCHA for anonymous link creation if rate limits are approached.
Ability to disable links reported as abusive (manual or automated).
Logging and audit trail for all link creation events.

KEY TRADE-OFFS

Consistency vs. Latency:

We choose eventual consistency for the link mapping across regions. A newly created link might not be resolvable in a distant region for a few hundred milliseconds. This is acceptable because the user who creates the link is routed to the nearest region where the write is immediately consistent, and cross-region replication is fast. For analytics, we explicitly accept 10-minute eventual consistency.

302 vs. 301 Redirects:

302 (temporary) allows us to track every click and change destinations, but increases origin load. 301 (permanent) is more efficient but browsers cache it indefinitely. Our compromise: use 302 from the origin but allow CDN edge caching with a short TTL, giving us the best of both worlds.

Cost vs. Performance:

DynamoDB on-demand pricing is more expensive per request than provisioned, but handles bursty traffic without capacity planning. For a service at this scale, we use provisioned capacity with auto-scaling for the baseline and on-demand for overflow.
CDN and edge compute add cost but dramatically reduce origin infrastructure needs and improve user experience.

Operational Complexity vs. Reliability:

Multi-region deployment with DynamoDB Global Tables adds operational complexity but is necessary for the 99.99% availability target and global low-latency requirement.
Kafka adds another system to operate but provides the decoupling and durability needed for reliable analytics.

Storage vs. Computation:

We pre-aggregate daily analytics counts rather than computing them on the fly from raw click events. This trades storage efficiency for query performance and simplicity.

Custom Alias Uniqueness:

Custom aliases share the same namespace as generated codes. We reserve the generated code space (exactly 7 base62 characters) and require custom aliases to be either a different length or checked against the generated range. Simpler approach: all codes (generated and custom) go into the same table, and the conditional write ensures uniqueness regardless.

SUMMARY OF TECHNOLOGY CHOICES

Global Load Balancing: AWS Global Accelerator or Cloudflare with anycast DNS
CDN and Edge: CloudFront or Cloudflare with edge workers
Compute: Kubernetes clusters or ECS/Fargate in each region for stateless services
Primary Database: DynamoDB Global Tables (link mappings and daily analytics)
Cache: ElastiCache Redis clusters in each region
Message Queue: Amazon Kinesis or Apache Kafka (MSK) for click events
Analytics Processing: Kafka Streams or Flink for real-time aggregation
ID Generation: Range-based allocation service backed by a small coordination store (DynamoDB or ZooKeeper)
Monitoring: CloudWatch, Prometheus, Grafana, PagerDuty for alerting

This design handles the stated requirements with a clear separation of the write path (link creation), read path (redirects), and analytics path, each independently scalable. The layered caching strategy (edge, regional Redis, local) ensures that even extreme viral traffic is handled gracefully, while the multi-region architecture with DynamoDB Global Tables provides the durability and availability guarantees needed for a 99.99% SLA.

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

Judge Models Google Gemini 2.5 Flash

Total Score

Overall Comments

Answer A provides a very comprehensive and well-reasoned design. It covers all aspects of the prompt in detail, including back-of-the-envelope estimates and a dedicated section for abuse prevention. The layered caching strategy, the nuanced discussion of 301 vs 302 redirects, and the detailed ID generation approach are particularly strong. The architecture is coherent, and the justifications for technology choices are sound, demonstrating a deep understanding of distributed systems principles.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is well-structured with clear component responsibilities and data flows. The choice of DynamoDB Global Tables and a multi-layered caching strategy is appropriate for the requirements. The nuanced discussion of 302 vs 301 redirects, leveraging CDN capabilities, is a strong point.

Completeness

Weight 20%

Answer A is very complete, covering all aspects of the prompt, including initial back-of-the-envelope estimates, detailed API design, and a dedicated section for basic abuse prevention, which Answer B omits.

Trade-off Reasoning

Weight 20%

Excellent discussion of key trade-offs, including consistency vs. latency, 302 vs. 301 redirects (with a practical compromise), cost vs. performance, and operational complexity. The reasoning is clear and well-justified.

Scalability & Reliability

Weight 20%

The design demonstrates strong scalability and reliability, with a multi-region active-active setup, layered caching (CDN, regional Redis, local), edge compute for viral links, and robust failure handling mechanisms. The use of Kafka for analytics decoupling further enhances reliability.

Clarity

Weight 10%

The answer is very clear, well-organized with distinct sections, and easy to follow. The explanations are concise yet comprehensive, making the design understandable.

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer A is a comprehensive, well-structured end-to-end design that covers all required dimensions with notable depth. It provides back-of-the-envelope estimates, a detailed data model with schema specifics, a clearly justified ID generation strategy with range-based allocation, a thorough multi-layer caching strategy (CDN edge, regional Redis, local in-process), explicit failure handling with circuit breakers and graceful degradation, and a nuanced trade-off discussion including 301 vs 302 redirect semantics. The abuse prevention section and technology summary add practical completeness. Minor weaknesses include some verbosity and the analytics counter update mechanism (updating click_count in the main table periodically) could be explained more precisely, but overall the answer is thorough and coherent.

View Score Details ▼

Architecture Quality

Weight 30%

Answer A presents a coherent, layered architecture with clear separation of write path, read path, and analytics path. It specifies CDN edge workers, regional Redis, local in-process cache, Kafka for event buffering, and DynamoDB Global Tables. Each component is justified relative to the workload. The flow descriptions are precise and the component interactions are well-explained.

Completeness

Weight 20%

Answer A covers all eight required design areas plus adds abuse prevention, technology summary, and back-of-the-envelope estimates. The API design includes error codes, the data model includes TTL and status fields, and the analytics pipeline is described end-to-end. Very few gaps exist.

Trade-off Reasoning

Weight 20%

Answer A explicitly discusses 301 vs 302 redirect semantics and the compromise solution, eventual vs strong consistency with justification, cost vs performance for DynamoDB pricing models, operational complexity vs reliability, and storage vs computation for analytics pre-aggregation. These are concrete, workload-specific trade-offs.

Scalability & Reliability

Weight 20%

Answer A describes a three-layer caching strategy with specific TTLs and sizing estimates, Redis failover path to the database, circuit breakers, Kafka durability for analytics buffering, DynamoDB Global Tables for multi-region writes, and auto-scaling for both Redis and stateless services. The viral hotspot handling via CDN edge compute is well-articulated.

Clarity

Weight 10%

Answer A is well-organized with numbered sections and clear subsections. The length is substantial but each section adds value. Some sections are verbose (e.g., the summary repeats earlier content), but overall the structure aids navigation and comprehension.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer A presents a coherent end-to-end design with solid scale estimates, clear separation of redirect and analytics paths, practical storage choices, layered caching, multi-region strategy, abuse controls, and explicit trade-off discussion. It covers nearly all requested areas in concrete terms. Its main weaknesses are some overreach and minor inconsistency, such as mixing DynamoDB Global Tables with a single primary write region narrative, and a somewhat muddled 301 versus 302 caching discussion.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is well structured, with clear separation of create, redirect, cache, database, queue, and analytics paths. It appropriately treats redirect serving as the critical hot path and analytics as asynchronous. Multi-region deployment and CDN plus edge strategy are well integrated, though a few technology combinations are somewhat overbroad.

Completeness

Weight 20%

It covers all requested topics: architecture, data model, token generation, custom aliases, APIs, caching, partitioning, replication, reliability, multi-region, hotspot scaling, abuse prevention, and trade-offs. It also includes useful rough estimates and storage sizing.

Trade-off Reasoning

Weight 20%

The answer explicitly discusses consistency versus latency, 302 versus 301 behavior, cost versus performance, storage versus computation, and operational complexity. Some trade-off framing is strong, though parts of the redirect caching discussion are slightly conflicted.

Scalability & Reliability

Weight 20%

This is a strong area for A. It gives layered caching, partitioning logic, regional replication, failover posture, queue-based buffering, circuit breakers, monitoring, and explicit viral hotspot strategies. It directly connects these mechanisms to the 99.99 percent redirect availability goal.

Clarity

Weight 10%

The answer is organized, easy to follow, and broken into clear sections. It is long but mostly readable, with concrete bullets and rationale. A few sections are slightly dense and occasionally mix alternatives in a way that blurs the final recommendation.

Answer B: Google Gemini 2.5 Pro

1. High-Level Architecture

The system is designed as a globally distributed, multi-region architecture to ensure low latency and high availability. Each region acts as a self-contained Point of Presence (PoP).

Core Components:

Global Load Balancer (GLB): Uses DNS-based routing (e.g., latency-based, geo-based) to direct users to the nearest healthy region.
CDN (Content Delivery Network): Sits in front of our application servers to cache redirect responses (HTTP 301) for popular links at the edge, drastically reducing latency and load on our infrastructure for viral content.
**Regiona...

Show Full Answer ▼

1. High-Level Architecture

The system is designed as a globally distributed, multi-region architecture to ensure low latency and high availability. Each region acts as a self-contained Point of Presence (PoP).

Core Components:

Global Load Balancer (GLB): Uses DNS-based routing (e.g., latency-based, geo-based) to direct users to the nearest healthy region.
CDN (Content Delivery Network): Sits in front of our application servers to cache redirect responses (HTTP 301) for popular links at the edge, drastically reducing latency and load on our infrastructure for viral content.
Regional Load Balancer: Distributes traffic within a region to the stateless web servers.
Web Servers (API Gateway): Terminate TLS and route requests to the appropriate backend service.
Redirect Service (Read Path): A highly optimized, stateless microservice that handles GET /{short_id} requests. It performs a lookup (cache-first, then database) and issues a redirect. This is the most critical and highest-traffic component.
Link Service (Write Path): A microservice that handles POST /api/v1/links for creating new short links. It coordinates with the ID Generator and writes to the database.
Analytics Service: A microservice that provides analytics data via GET /api/v1/links/{short_id}/analytics. It reads from a dedicated analytics data store.
ID Generator Service: A dedicated service (e.g., based on Snowflake) that generates globally unique, roughly time-sortable 64-bit IDs to be used for short links.
Message Queue (e.g., Kafka, AWS SQS): Decouples the critical redirect path from non-critical analytics processing. The Redirect Service publishes a lightweight "click event" to the queue for every successful redirect.
Analytics Ingestion Processor: A consumer service that reads from the message queue, processes click events, and updates the analytics data store and aggregate counters.

(Conceptual diagram link)

2. Data Model and Storage Choices

We will use two separate data stores optimized for their specific access patterns.

A. Link Mapping Store

Choice: A distributed NoSQL key-value store like Amazon DynamoDB with Global Tables or Apache Cassandra.
Justification: This choice is driven by the need for massive scalability, high availability, and low-latency key-based lookups. The primary read pattern is a direct lookup by short_id, which is a perfect fit for a key-value model. A multi-region, multi-master setup (like DynamoDB Global Tables) provides low-latency reads and writes for global users and built-in disaster recovery.
Schema (links table):
- short_id (String, Partition Key): The unique 7-character code or custom alias.
- long_url (String): The destination URL.
- created_at (Timestamp): Creation timestamp.
- total_clicks (Number): An atomic counter for total lifetime clicks, updated by the analytics processor.

B. Analytics Data Store

Choice: A wide-column or time-series database like Apache Cassandra or Amazon Timestream.
Justification: This store needs to handle a very high write throughput of click events and efficiently query data by time range (e.g., last 30 days). A wide-column store allows us to model this effectively.
Schema (clicks_by_day table):
- short_id (String, Partition Key): The link identifier.
- event_date (String, Clustering Key, format YYYY-MM-DD): The date of the clicks.
- daily_count (Counter): A distributed counter for the number of clicks on that day.

This pre-aggregated model is efficient for the required query ("clicks by day"). Raw event data could be stored in a cheaper object store (like S3) for long-term archival if needed.

3. ID Generation Strategy

Generated IDs: We need short, unique, non-sequential IDs. A 7-character string using Base62 encoding ([a-zA-Z0-9]) provides 62^7 (~3.5 trillion) unique combinations, which is more than sufficient for the 7.2 billion links expected over 5 years.
1. The ID Generator Service produces a globally unique 64-bit integer (similar to Twitter's Snowflake), which combines a timestamp, a region/machine ID, and a sequence number. This avoids a single point of failure and contention.
2. The Link Service requests an ID from the generator.
3. It then Base62-encodes the 64-bit integer to create the short_id string.
Custom Aliases:
1. When a user submits a custom alias, the Link Service performs a direct read on the links table using the alias as the key.
2. If the key does not exist, it proceeds to write the new entry using a conditional write (e.g., put if not exists) to prevent race conditions where two users claim the same alias simultaneously.
3. If the key exists, it returns a 409 Conflict error.

4. API Design

The service exposes a simple RESTful API.

Create Short Link: POST /api/v1/links
- Request Body: {"long_url": "https://example.com/very/long/path", "custom_alias": "my-link"} (custom_alias is optional)
- Success (201 Created): {"short_url": "https://sho.rt/my-link", "long_url": "..."}
- Error (409 Conflict): If the custom alias is already taken.
Redirect User: GET /{short_id}
- Example: GET /aB1x2Yc
- Success (301 Moved Permanently): Location: https://example.com/very/long/path
- Error (404 Not Found): If the short_id does not exist.
Get Analytics: GET /api/v1/links/{short_id}/analytics
- Success (200 OK): {"short_id": "aB1x2Yc", "total_clicks": 15023, "daily_clicks": [{"date": "2023-10-27", "clicks": 850}, {"date": "2023-10-26", "clicks": 1200}]}

5. Caching, Partitioning, and Replication

Caching: A multi-layer caching strategy is essential for handling read-heavy traffic and viral hotspots.
1. CDN Caching: The CDN caches 301 redirect responses at the edge. This is the first line of defense and handles the bulk of traffic for popular links.
2. In-Memory Cache: Each region has a distributed cache (e.g., Redis Cluster) that stores short_id -> long_url mappings. The Redirect Service follows a read-through pattern: check Redis first, and on a miss, query the database and populate the cache with a TTL (e.g., 24 hours).
3. Negative Caching: Cache "not found" results for a short period (e.g., 5 minutes) to prevent repeated database lookups for invalid or deleted links.
Partitioning: The NoSQL databases (DynamoDB/Cassandra) automatically partition data based on the short_id hash. This distributes the data and I/O load horizontally across the cluster, enabling massive scale.
Replication: The primary links database is replicated across all active regions in a multi-master configuration. This ensures that a link created in one region is quickly available for redirection in all other regions, providing low-latency reads globally and high data durability.

6. Reliability and Scaling

Reliability:
- Multi-Region Active-Active: The entire stack is deployed in at least three geographically separate regions. The GLB performs health checks and will automatically route traffic away from any failing region, allowing the service to survive a full regional outage.
- Stateless Services: All services (Redirect, Link, Analytics) are stateless, allowing them to be scaled horizontally and replaced easily if an instance fails.
- Decoupling: The message queue ensures that a slowdown or failure in the analytics pipeline does not impact the critical, user-facing redirect functionality.
Scaling for Read Traffic:
1. CDN: The CDN absorbs the majority of requests for viral links.
2. Horizontal Scaling: The stateless Redirect Service can be scaled out by adding more instances behind the regional load balancer.
3. Cache Scaling: The distributed cache can be scaled by adding more nodes to the cluster.
4. Database Scaling: The partitioned NoSQL database scales by adding more nodes.

7. Key Trade-offs

Consistency vs. Availability: We prioritize Availability over Strong Consistency. Using a multi-master database means write conflicts are possible (though rare for this use case). Eventual consistency is acceptable for link propagation across regions and is required for the analytics data (up to 10 minutes delay).
Cost vs. Performance: A global, multi-region, active-active architecture with a CDN and multiple data stores is expensive. However, it is necessary to meet the stringent 99.99% availability and global low-latency requirements. A cheaper single-region design would not suffice.
Operational Complexity: The microservices architecture adds complexity in terms of deployment, monitoring, and maintenance compared to a monolith. This trade-off is made to achieve independent scalability, fault isolation, and resilience for different parts of the system.

Result

Winning Votes

0 / 3

Average Score

Judge Models Google Gemini 2.5 Flash

Total Score

Overall Comments

Answer B presents a solid and clear high-level architecture. It effectively addresses global distribution, scalability, and reliability with appropriate technology choices. The structure is easy to follow, and the trade-off discussions are relevant. However, it is slightly less detailed than Answer A in certain areas, such as initial estimates and abuse prevention, and its choice of 301 redirects for the main path is less flexible for analytics and updates compared to Answer A's approach.

View Score Details ▼

Architecture Quality

Weight 30%

The high-level architecture is clear and logical, with a good separation of concerns into microservices. The choice of a multi-region active-active setup with a CDN is appropriate. However, the use of 301 redirects for the main path is less flexible for analytics and updates compared to Answer A's approach.

Completeness

Weight 20%

Answer B covers most of the prompt's requirements, including architecture, data model, ID generation, API, caching, and reliability. However, it lacks initial estimates and a specific section on abuse prevention, making it slightly less complete than Answer A.

Trade-off Reasoning

Weight 20%

Answer B provides a good discussion of trade-offs, focusing on consistency vs. availability, cost vs. performance, and operational complexity. The reasoning is sound, but it is slightly less detailed and nuanced compared to Answer A's trade-off analysis.

Scalability & Reliability

Weight 20%

Answer B outlines a robust approach to scalability and reliability through a multi-region active-active architecture, CDN caching, horizontal scaling of stateless services, and database partitioning. The use of a message queue for decoupling is also a good reliability measure. It is strong but slightly less detailed on advanced scaling for viral links beyond CDN compared to Answer A.

Clarity

Weight 10%

Answer B is very clear and concise, presenting the information in a well-structured and easy-to-read format. The use of headings and bullet points enhances readability.

Judge Models Anthropic Claude Sonnet 4.6

Total Score

Overall Comments

Answer B is a solid, readable design that covers the main components and makes reasonable choices. It correctly identifies the key data stores, caching layers, ID generation approach, and multi-region deployment. However, it is noticeably shallower in several areas: the back-of-the-envelope estimates are absent, the failure handling discussion lacks specifics (no circuit breakers, no graceful degradation detail, no Redis failover path), the analytics pipeline is underspecified (no mention of Kafka Streams/Flink or batching strategy), the abuse prevention section is missing entirely, and the trade-off discussion is brief and generic. The use of HTTP 301 for redirects without acknowledging the click-tracking problem is a meaningful oversight. The answer is competent but does not reach the depth expected for a senior-level system design benchmark.

View Score Details ▼

Architecture Quality

Weight 30%

Answer B identifies the correct major components and their roles, but the architecture description is higher-level and less precise. The interaction between components is described at a surface level, and some components like the analytics ingestion processor are mentioned but not elaborated. The overall structure is sound but lacks the depth and justification of Answer A.

Completeness

Weight 20%

Answer B covers most required areas but omits back-of-the-envelope estimates, abuse prevention, and provides only a brief analytics pipeline description. The API design is present but less detailed (missing error codes for redirect, no mention of auth headers). The reliability section lacks specifics on failover procedures and graceful degradation.

Trade-off Reasoning

Weight 20%

Answer B mentions three trade-offs (consistency vs availability, cost vs performance, operational complexity) but at a generic level. It does not address the 301 vs 302 redirect issue, does not discuss ID generation trade-offs, and does not engage with the analytics consistency window in a meaningful way. The trade-off discussion reads as boilerplate rather than design-specific reasoning.

Scalability & Reliability

Weight 20%

Answer B correctly identifies CDN, Redis, horizontal scaling, and multi-region active-active as the main reliability and scalability mechanisms. However, it does not discuss what happens when Redis fails, does not mention circuit breakers or backpressure, and the viral hotspot strategy is limited to CDN caching without discussing edge compute or local in-process caching. The reliability section is adequate but not thorough.

Clarity

Weight 10%

Answer B is concise and uses formatting (headers, bullet points) effectively. It is easier to skim but sacrifices depth for brevity. The clarity is good for what is written, but the shorter length means less information is conveyed overall.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer B gives a reasonable high-level architecture and correctly separates hot-path redirects from analytics. It chooses sensible technologies and mentions caching, partitioning, replication, and eventual consistency. However, it is noticeably thinner on capacity reasoning, failure handling details, data model completeness, abuse prevention, API nuances, and hotspot mitigation. Some choices are under-justified, and the answer remains more generic than benchmark-strong.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is sensible and clean, with appropriate separation between read path, write path, and analytics. However, it stays at a more generic service-box level and gives less detail on how components interact under failure or extreme hotspots.

Completeness

Weight 20%

It addresses most headline areas but with notable omissions or thin treatment. It lacks meaningful scale estimates, gives little on abuse prevention, limited API/error nuance, limited failure handling, and less detail on retention, expiration, and operational mechanisms.

Trade-off Reasoning

Weight 20%

It acknowledges major trade-offs such as availability versus consistency and cost versus performance, but the reasoning is brief and does not deeply explore design alternatives or their consequences.

Scalability & Reliability

Weight 20%

B shows sound instincts with active-active regions, stateless services, CDN, and queue decoupling. Still, it is lighter on concrete failover behavior, degraded-mode handling, cache invalidation, queue lag/backpressure, and region-level operational details needed for a stronger reliability story.

Clarity

Weight 10%

The answer is concise and well structured, making it easy to read quickly. However, its brevity also reduces precision, and some points remain too abstract to be fully actionable.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Winner Anthropic Claude Opus 4.6

Winning Votes

3 / 3

Average Score

View this answer

Google Gemini 2.5 Pro

Winning Votes

0 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models OpenAI GPT-5.4

Claude Opus 4.6 Winner 87

Gemini 2.5 Pro 69

Why This Side Won

Answer A wins because it is substantially more complete and better reasoned across the full system design scope. It maps requirements to components more concretely, provides rough sizing, explains token generation and custom alias handling in more depth, details multi-layer caching and multi-region reliability strategies, and addresses abuse prevention and operational trade-offs. Answer B is competent but too high level and omits several important implementation and failure-mode considerations.

Judge Models Anthropic Claude Sonnet 4.6

Claude Opus 4.6 Winner 84

Gemini 2.5 Pro 64

Why This Side Won

Answer A wins across all major criteria. It provides quantitative estimates, a more detailed and justified data model, a superior caching strategy with three explicit layers, explicit failure handling mechanisms, a richer trade-off discussion (including the 301 vs 302 nuance), and covers abuse prevention. Answer B is correct in its high-level choices but lacks the depth, specificity, and completeness that the task and judging policy require. The gap is most pronounced in scalability/reliability detail, completeness, and trade-off reasoning.

Judge Models Google Gemini 2.5 Flash

Claude Opus 4.6 Winner 85

Gemini 2.5 Pro 79

Why This Side Won

Answer A is chosen as the winner due to its superior completeness and depth. It provides back-of-the-envelope estimates, explicitly covers abuse prevention, and offers a more nuanced and robust approach to redirect handling (302 from origin, 301 with short TTL at CDN). The detailed breakdown of ID generation, layered caching, and trade-offs further distinguishes it as a more thorough and well-considered design.

Design a Global URL Shortening Service

Task Overview

Task Prompt

Answers

Answer A: Anthropic Claude Opus 4.6

Answer B: Google Gemini 2.5 Pro

1. High-Level Architecture

1. High-Level Architecture

2. Data Model and Storage Choices

3. ID Generation Strategy

4. API Design

5. Caching, Partitioning, and Replication

6. Reliability and Scaling

7. Key Trade-offs

Comparison Summary

Judging Results

Related Tasks

Summarize a Town-Hall Debate on Urban Flood Resilience

Write a Funny Wedding Toast for Two Librarians

Low-Cost Ideas to Reduce Meeting Overload in a Remote Team

Write a Museum Exhibit Story Told by an Object

Ideas to reduce food waste in a small apartment building

One-Day Community Fair Recovery Plan After a Storm

Evaluate a Carbon Tax Policy Using Supply and Demand

Respond to a Friend After a Painful Job Rejection

Related Links