Design a URL Shortening Service

Compare model answers for this System Design benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

System Design

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

Anthropic Claude Opus 4.6

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A OpenAI GPT-5.2

Answer B Google Gemini 2.5 Pro

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.4 Anthropic Claude Opus 4.6 Google Gemini 2.5 Flash

Task Prompt

Show more ▼

Design a URL shortening service similar to bit.ly or TinyURL. Your design should address the following aspects: 1. **Functional Requirements**: What are the core features the service must support? Consider URL creation, redirection, expiration, and analytics. 2. **High-Level Architecture**: Describe the main components of the system (e.g., API layer, application servers, databases, caches, load balancers). Explain how they interact. 3. **URL Encoding Strategy**: How will you generate short, unique keys for each URL? Discuss your approach (e.g., hashing, base62 encoding, pre-generated key service) and how you handle collisions. 4. **Database Design**: What database(s) would you use and why? Provide the schema for the core table(s). Discuss the trade-offs between SQL and NoSQL for this use case. 5. **Scalability and Performance**: How would you handle high read traffic (e.g., millions of redirects per day)? Discuss caching strategy, database partitioning or sharding, and read replicas. 6. **Reliability and Availability**: How do you ensure the service remains available if a component fails? Discuss redundancy, replication, and failover strategies. 7. **Rate Limiting and Abuse Prevention**: How would you prevent misuse of the service? Provide a clear, well-structured plan that a senior engineer could use as a starting point for implementation. Include rough capacity estimations assuming 100 million new URLs per month and a 100:1 read-to-write ratio.

Judging Policy

Show more ▼

A strong answer should be evaluated on the following criteria: (1) Completeness — all seven aspects listed in the prompt should be addressed with meaningful detail, not just mentioned in passing. (2) Technical depth — the answer should demonstrate understanding of real-world trade-offs, such as SQL vs NoSQL, hashing vs counter-based ID generation, cache eviction policies, and consistency models. (3) Capacity estimation — the answer should include back-of-the-envelope math for storage, bandwidth, and QPS based on the given traffic numbers. (4) Clarity and structure — the plan should be logically organized and easy to follow. (5) Practicality — the proposed design should be realistic and implementable, not overly theoretical. Answers that only provide vague or surface-level descriptions without justifying choices or discussing trade-offs should be rated lower. Answers that include diagrams described in text, concrete schema definitions, and specific technology recommendations with rationale should be rated higher.

Answers

Answer A: OpenAI GPT-5.2

Functional Requirements
Core (MVP)

Create short URL
- Input: long_url, optional custom_alias, optional expiration, optional user_id/api_key
- Output: short_url, key, created_at, expiry_at
Redirect
- GET /{key} -> 301/302 to long_url
- Honor expiration/disabled state
Basic analytics
- Count redirects per key (total + time buckets)
- Optional: referrer, country/region (coarse), user-agent family, device type
Link management
- Retrieve metadata (long_url, created_at, expiry_at, status)
- Disable/delete (soft delete)
  Nice-to-have
Custom domains per customer
Bulk shortening API
QR code...

Show Full Answer ▼

Functional Requirements
Core (MVP)

Create short URL
- Input: long_url, optional custom_alias, optional expiration, optional user_id/api_key
- Output: short_url, key, created_at, expiry_at
Redirect
- GET /{key} -> 301/302 to long_url
- Honor expiration/disabled state
Basic analytics
- Count redirects per key (total + time buckets)
- Optional: referrer, country/region (coarse), user-agent family, device type
Link management
- Retrieve metadata (long_url, created_at, expiry_at, status)
- Disable/delete (soft delete)
  Nice-to-have
Custom domains per customer
Bulk shortening API
QR code generation
Anti-malware / safe browsing checks
A/B routing, UTM templates
Non-functional
Very low redirect latency (p95 < ~20–50ms from edge/cache)
High availability (multi-AZ/region)
Strong consistency not required for analytics, but required for key->URL mapping

High-Level Architecture
Traffic flow

DNS -> CDN/Edge (optional but recommended)
Global load balancer (GSLB) -> Regional L7 load balancer
API Gateway
- Auth (API keys/OAuth), throttling, request validation
Application services (stateless)
- Shorten Service (writes)
- Redirect Service (reads, extremely hot path)
- Analytics Ingest Service (async)
  Data layer
Primary Key-Value store for mapping key -> destination record
Cache layer (Redis/Memcached) for hot key lookups
Analytics pipeline
- Redirect Service emits event to a log/queue (Kafka/PubSub/Kinesis)
- Stream processor aggregates into OLAP store (ClickHouse/BigQuery/Druid) and/or time-series (Cassandra/Scylla)
- Periodic rollups for dashboards
  Supporting services
Key generation service (if using pre-generated IDs)
Abuse detection service (URL reputation, user behavior)
Observability: metrics, tracing, logs

Interaction

Create:
- Client -> API Gateway -> Shorten Service
- Validate URL, check abuse, optional custom alias uniqueness
- Obtain unique key (encoding strategy below)
- Write mapping to DB
- Invalidate/prime cache
Redirect:
- Client -> CDN/Edge -> Redirect Service
- Lookup key in cache; on miss, query DB
- If found and not expired/disabled: respond 301/302
- Emit async analytics event

URL Encoding Strategy
Goals: uniqueness, short length, high throughput, no central bottleneck.
Recommended: numeric ID + Base62

Use a monotonically increasing 64-bit ID (or time-ordered ID) and encode in Base62 (0-9a-zA-Z).
For 100M new URLs/month (~3.86k writes/sec average; higher peak), ID generation must support > tens of thousands/sec.
Options:
A) Database sequence (simple)
Pros: easy, strongly unique
Cons: can be bottleneck and hard across shards; requires coordination
B) Distributed ID (Snowflake-like) (recommended)
64-bit: timestamp + region/node + sequence
Pros: scalable, no single writer
Cons: slightly longer keys if you encode full 64-bit; still compact in Base62 (up to 11 chars)
C) Pre-generated key pool
Background job generates random Base62 strings, stores unused pool; app reserves keys.
Pros: decouples from ordering, can keep keys short
Cons: pool management complexity
Collision handling
For ID-based approach: no collisions by construction.
For custom aliases or random keys: enforce uniqueness with conditional put/unique constraint; on collision, retry with new key.
Key length
Base62 length needed: 100M/month implies ~1.2B/year. Base62^7 ≈ 3.5T so 7 chars is plenty if using sequential IDs; Snowflake IDs may be 10–11 chars but acceptable.

Database Design
Primary store requirements

Very high read QPS, key-based lookups, small records, low latency.
Strongly consistent writes for key uniqueness; reads can be eventually consistent if cache is correct, but prefer consistent read-after-write for new links.
Recommended: DynamoDB / Cassandra / ScyllaDB (NoSQL KV) OR MySQL/Postgres with sharding.
NoSQL KV pros: horizontal scale, high throughput, predictable latency.
SQL pros: constraints, transactions, simpler for custom alias uniqueness and admin queries; but sharding/replicas become more complex at scale.
Pragmatic choice
Mapping store: DynamoDB (or Cassandra/Scylla) as system of record.
Optional relational store for user/account/billing.
Core schema (KV / wide-column)
Table: url_mapping
key (partition key, string)
long_url (string)
created_at (timestamp)
expiry_at (timestamp, nullable)
status (active|disabled|deleted)
user_id (string/uuid, nullable)
custom_alias (bool)
domain (string, default)
last_accessed_at (timestamp, nullable)
redirect_code (int: 301/302)
Indexes / access patterns
Primary: key -> record
By user (for management UI): secondary index
- GSI: user_id as partition key, created_at as sort key (or reverse)
By long_url (optional dedupe): hash(long_url) index (only if you want “same long URL returns same key” behavior)
Analytics storage (separate)
Raw events in object storage (S3/GCS) + streaming aggregate into OLAP.
Aggregated table example (ClickHouse): (key, day/hour, redirects, unique_ips_approx, country, referrer_domain, ua_family)
SQL vs NoSQL trade-off summary
SQL: easier uniqueness for custom aliases, ad-hoc queries; harder to scale writes/reads without careful sharding.
NoSQL: best for primary lookup workload; must design access patterns upfront; uniqueness for custom aliases handled via conditional writes.

Scalability and Performance
Traffic estimates

Writes: 100M/month ≈ 3.86k/s average, plan for 10x peak => ~40k/s.
Reads: 100:1 => 386k/s average redirects, plan 10x peak => ~4M/s peak globally.
Storage
100M/month * 12 = 1.2B mappings/year.
Record size (key ~10B, URL avg 200B, metadata): assume ~500B–1KB.
1.2B * 1KB ≈ 1.2TB/year (plus replication and indexes).
Caching
Redis/Memcached cluster per region.
Cache key: short key; value: long_url + status + expiry_at + redirect_code.
TTL strategy:
- For non-expiring links: long TTL (e.g., 1–7 days) with refresh-on-access.
- For expiring links: TTL aligned with expiry.
Negative caching for missing/disabled keys (short TTL) to reduce DB hits.
CDN/Edge caching for redirects where safe:
- Cache 301 for public, non-expiring links; careful with per-user or dynamic redirects.
  Sharding/partitioning
NoSQL: partition by key; ensure uniform distribution.
If SQL: shard by key hash; maintain routing layer.
Read replicas
If using SQL or a replicated KV store: add read replicas for management/read-heavy non-redirect queries.
Hot keys
Extremely popular short URLs can overload cache nodes.
- Use consistent hashing with sufficient virtual nodes.
- Consider in-process LRU cache in redirect service.
- Edge caching at CDN reduces origin load.
  Write path optimization
Batch analytics events; never block redirect on analytics.

Reliability and Availability
Multi-AZ

Run API/Redirect services across multiple AZs behind load balancer.
Cache: Redis cluster with replication + automatic failover (or managed Redis).
DB: multi-AZ replication; quorum reads/writes as appropriate.
Multi-region (recommended for global service)
Active-active redirects: replicate mapping DB cross-region (DynamoDB global tables / Cassandra multi-DC).
Writes can be routed to nearest region; resolve conflicts:
- For ID-based keys, collisions are unlikely; custom aliases require global uniqueness—handle by routing custom alias creation to a “home region” per domain or using strongly consistent global coordination (rare path).
  Failover
Health checks + automated traffic shifting via GSLB.
Stateless services allow fast scaling and replacement.
Backups and DR
Continuous backups/snapshots of mapping store.
Store raw analytics logs in durable object storage.
Graceful degradation
If analytics pipeline is down, continue redirects and buffer events (queue retention) or sample.
If cache is down, redirect service falls back to DB (expect latency increase, but service remains functional).

Rate Limiting and Abuse Prevention
Rate limiting

Per API key/user/IP limits for creation endpoints (token bucket/leaky bucket at API Gateway).
Separate, higher limits for redirects; protect from floods with CDN/WAF.
Abuse controls
URL validation: allow-list schemes (http/https), max length, block internal IP ranges (SSRF-style) for preview/scanning components.
Malware/phishing detection:
- Integrate with Safe Browsing/reputation feeds.
- Asynchronous scanning; optionally “pending” state until scan passes for untrusted users.
Custom alias policies: reserved words, profanity filter, trademark policies.
Bot detection: device fingerprinting heuristics, CAPTCHA for suspicious creation traffic.
Redirect protection:
- WAF rules, anomaly detection on redirect patterns.
- Per-key redirect rate alerts; temporary throttling.

Capacity/Throughput Planning Summary (rough)

Peak redirects: target a few million requests/sec globally.
- CDN offload + regional caches should serve majority (aim >95% cache hit for hot set).
- Origin DB reads should be minimized; design for <5% misses.
Writes: tens of thousands/sec at peak; distributed ID generation avoids central bottleneck.
Data growth: ~1.2B mappings/year; plan multi-TB storage with replication and indexing overhead.

Implementation starting point (component list)

API Gateway + Auth
Shorten Service
Redirect Service (optimized, minimal dependencies)
Redis cache cluster per region
Mapping DB (DynamoDB/Cassandra/Scylla)
ID generator (Snowflake) library/service
Kafka/Kinesis + stream processors + OLAP store
Admin/management service + user DB
Monitoring/alerting + WAF/CDN

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

Judge Models Google Gemini 2.5 Flash

Total Score

Overall Comments

The design for the URL shortening service is exceptionally comprehensive, well-structured, and technically sound. It addresses all prompt requirements with significant depth, offering practical solutions and justified trade-offs. Strengths include detailed architectural components, a robust URL encoding strategy, thoughtful database design with schema, and extensive coverage of scalability, reliability, and abuse prevention. The capacity estimations are integrated effectively. The plan is clear, concise, and provides a solid foundation for implementation, demonstrating an excellent understanding of distributed systems design.

View Score Details ▼

Architecture Quality

Weight 30%

The high-level architecture is very well-defined, delineating clear components such as API Gateway, separate services for writes and reads (Shorten, Redirect), and an asynchronous analytics pipeline. The proposed data layer with a primary KV store, cache, and OLAP for analytics is appropriate for the workload. The interaction flows for create and redirect operations are precisely described, highlighting the critical role of caching for the hot redirect path and considering global distribution.

Completeness

Weight 20%

The answer provides a complete and detailed response to all seven aspects of the prompt. It covers functional and non-functional requirements, a comprehensive high-level architecture, a well-reasoned URL encoding strategy, detailed database design with schema and trade-offs, robust scalability and reliability mechanisms, and practical abuse prevention strategies. The inclusion of rough capacity estimations and an implementation starting point further enhances its completeness.

Trade-off Reasoning

Weight 20%

The answer demonstrates strong reasoning for various technical trade-offs. It clearly discusses the pros and cons of different URL encoding strategies (DB sequence vs. distributed ID vs. pre-generated pool) and justifies the choice of numeric ID + Base62. The detailed comparison between SQL and NoSQL for the primary data store, including their respective challenges for scaling and unique constraints, is excellent. Cache TTL strategies and multi-region conflict resolution are also well-considered.

Scalability & Reliability

Weight 20%

Scalability is thoroughly addressed through detailed traffic estimates, comprehensive caching strategies (Redis, CDN, negative caching), sharding/partitioning, and hot key management. Reliability is equally well-covered with multi-AZ and multi-region deployments, robust replication, failover mechanisms, continuous backups, and strategies for graceful degradation. The proposed solutions are practical and robust, ensuring high availability and performance under heavy load.

Clarity

Weight 10%

The plan is exceptionally clear, well-structured, and easy to follow. The use of clear headings, subheadings, and bullet points makes the content highly digestible. The language is precise and technical, suitable for a senior engineer. Specific technology recommendations (e.g., DynamoDB, Cassandra, Snowflake, Redis, ClickHouse) are provided with context, further enhancing the clarity and practicality of the design.

Judge Models Anthropic Claude Opus 4.6

Total Score

Overall Comments

This is an excellent, comprehensive system design answer that addresses all seven required aspects with meaningful depth. It includes concrete capacity estimations, specific technology recommendations with rationale, detailed schema definitions, and thorough discussion of trade-offs. The answer is well-structured with clear sections, covers edge cases like hot keys and graceful degradation, and provides practical implementation guidance. Minor areas for improvement include slightly more detailed back-of-the-envelope math for bandwidth and a text-described architecture diagram, but overall this is a very strong response suitable as a senior engineer's starting point.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is well-designed with clear separation of concerns: stateless application services, dedicated read/write paths, async analytics pipeline via Kafka, caching layer, and CDN/edge. The interaction flows for both create and redirect are clearly described. The choice of Snowflake-like distributed ID generation is well-justified. The multi-region active-active design with DynamoDB global tables or Cassandra multi-DC is practical. The only minor gap is the lack of a text-based diagram, though the textual description of the flow is quite clear.

Completeness

Weight 20%

All seven aspects from the prompt are thoroughly addressed. Functional requirements include both core and nice-to-have features. The URL encoding strategy covers multiple approaches with pros/cons. Database design includes schema, access patterns, and indexes. Scalability covers caching, sharding, hot keys, and CDN. Reliability covers multi-AZ, multi-region, failover, backups, and graceful degradation. Rate limiting and abuse prevention are detailed. Capacity estimations are included with writes/sec, reads/sec, and storage calculations. The answer also includes non-functional requirements and an implementation component list.

Trade-off Reasoning

Weight 20%

Strong trade-off analysis throughout. SQL vs NoSQL is discussed with specific pros and cons for this use case. Three ID generation approaches are compared with clear reasoning for recommending Snowflake-like IDs. Cache TTL strategies differentiate between expiring and non-expiring links. The answer discusses 301 vs 302 redirect codes, consistency models for different data types, and the trade-off between custom alias global uniqueness and write routing. The discussion of negative caching and hot key mitigation shows real-world awareness. Could have gone slightly deeper on consistency guarantees during cross-region replication conflicts.

Scalability & Reliability

Weight 20%

Excellent coverage of scalability with concrete numbers: 3.86k writes/sec average, 386k reads/sec average, 10x peak planning, 1.2TB/year storage estimate. Caching strategy is well-thought-out with CDN, regional Redis clusters, in-process LRU, and negative caching. Hot key handling is addressed. Reliability section covers multi-AZ, multi-region, automated failover, graceful degradation when analytics or cache fails, and continuous backups. The 95% cache hit target is realistic. Could have included more specific bandwidth calculations and latency budget breakdowns.

Clarity

Weight 10%

The answer is exceptionally well-organized with clear section headers matching the prompt's seven aspects. Bullet points and sub-sections make it easy to scan. Technical terms are used precisely. The flow from functional requirements through architecture to implementation details is logical. The capacity summary at the end ties everything together. The component list at the end provides a practical implementation starting point. Very readable and actionable for a senior engineer.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

This is a strong, practical system design answer that covers all major areas requested by the prompt and is organized in a way that a senior engineer could build from. It does especially well on architecture, key generation, database choices, caching, multi-region reliability, and abuse prevention. The capacity section includes useful back-of-the-envelope estimates, though some math and assumptions are rough and could be expanded further with bandwidth, cache sizing, and more explicit daily or regional breakdowns. Trade-offs are discussed well, but a few choices remain somewhat broad rather than fully pinned down to one concrete implementation path.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is well-structured and realistic, with clear separation between API gateway, shorten service, redirect service, cache, primary mapping store, analytics pipeline, abuse detection, and observability. The redirect path is appropriately optimized and analytics are decoupled asynchronously, which is an important real-world design choice. Multi-AZ and multi-region concerns are addressed sensibly. A slightly higher score would require a more opinionated final architecture choice instead of listing several equivalent datastore options.

Completeness

Weight 20%

The answer addresses all seven required aspects in meaningful detail: functional requirements, high-level architecture, encoding strategy, database design, scalability and performance, reliability and availability, and rate limiting and abuse prevention. It also includes the requested capacity estimation and implementation starting point. Minor gaps include limited discussion of exact expiration enforcement mechanics and only a brief mention of redirect status code semantics.

Trade-off Reasoning

Weight 20%

The response demonstrates solid understanding of trade-offs, especially around ID generation approaches, SQL versus NoSQL, cache TTLs, analytics consistency, CDN caching, and custom alias uniqueness in multi-region setups. The reasoning is practical and reflects real system concerns. It loses some points because several sections present multiple technology choices without fully narrowing to a single preferred design and its consequences.

Scalability & Reliability

Weight 20%

Scalability and availability are handled well, with discussion of cache-first reads, hot-key mitigation, partitioning, replication, failover, queue-based analytics, and graceful degradation. The answer correctly prioritizes keeping redirects available even when analytics or cache components fail. Capacity planning is directionally good, but it could be stronger with more detailed QPS derivation, bandwidth estimates, cache hit assumptions translated into backend load, and storage overhead beyond the base record estimate.

Clarity

Weight 10%

The answer is very clear, logically organized, and easy to scan. Headings map directly to the prompt, bullets are concise but informative, and the final implementation checklist is useful. It reads like a practical engineering plan rather than a vague essay. The only minor issue is that a few sections are dense with options, which slightly reduces decisiveness.

Answer B: Google Gemini 2.5 Pro

1. Functional Requirements

Core Functionality:
- URL Shortening: Users can submit a long URL and receive a unique, shorter URL.
- URL Redirection: Accessing a short URL will redirect the user to the original long URL with a 301 (permanent) redirect.
Extended Features:
- Custom URLs: Users can choose a custom alias for their short URL (e.g., short.co/my-event).
- URL Expiration: Users can set an expiration time/date for a short URL, after which it will no longer redirect.
- Analytics: The service will track basic metrics for each link, such...

Show Full Answer ▼

1. Functional Requirements

Core Functionality:
- URL Shortening: Users can submit a long URL and receive a unique, shorter URL.
- URL Redirection: Accessing a short URL will redirect the user to the original long URL with a 301 (permanent) redirect.
Extended Features:
- Custom URLs: Users can choose a custom alias for their short URL (e.g., short.co/my-event).
- URL Expiration: Users can set an expiration time/date for a short URL, after which it will no longer redirect.
- Analytics: The service will track basic metrics for each link, such as total clicks, referrers, and geographic location of clicks.
- API Access: A REST API will be available for developers to integrate the shortening service into their applications.
- User Accounts: Registered users can manage their links, view analytics, and access higher API rate limits.

2. High-Level Architecture

The system is designed to be highly available and scalable, with a clear separation of read and write paths.

Load Balancer (LB): Distributes incoming traffic across multiple web servers to prevent any single server from being a bottleneck. It handles SSL termination.
Web/API Servers: A fleet of stateless servers running the application logic. They handle two primary requests:
- POST /api/v1/url (Write path): Creates a new short URL.
- GET /{short_key} (Read path): Redirects to the long URL.
Key Generation Service (KGS): A dedicated microservice responsible for generating unique, short keys. It pre-generates keys and stores them in a queue to ensure low latency for the write path.
Cache: A distributed in-memory cache (e.g., Redis Cluster) sits between the application servers and the database. It stores mappings of short_key to long_url for frequently accessed links to serve redirects at very low latency.
Database: The primary data store for all URL mappings and user data.
Analytics Pipeline: Redirect events are published to a message queue (e.g., Kafka). A separate service consumes these events, aggregates data, and stores it in a data warehouse (e.g., ClickHouse or AWS Redshift) for analytics queries.

Interaction Flow (Redirect):
Client -> Load Balancer -> API Server -> Cache (Redis). If cache hit, return redirect. If cache miss, -> Database -> API Server -> Client. The mapping is then written to the cache for subsequent requests.

3. URL Encoding Strategy

We will use a Base62 encoding strategy combined with a unique ID generator.

Approach: The Key Generation Service (KGS) will manage a global 64-bit counter (e.g., using a dedicated SQL database sequence, Zookeeper, or a custom service).
Process:
1. When an application server needs to create a link, it requests a unique ID from the KGS.
2. The KGS provides a unique number (e.g., 10001).
3. The application server converts this number to a Base62 string ([a-zA-Z0-9]). For example, 10001 in Base62 is 2k1.
4. To ensure a fixed length and avoid short, guessable keys, we can use a 7-character length. A 64-bit counter provides more than enough unique IDs (2^64), which can be mapped to a Base62 string. A 7-character Base62 string can represent 62^7 (~3.5 trillion) unique URLs, which is sufficient for our scale.
Collision Handling: This approach guarantees uniqueness, so collisions are impossible by design. This is superior to hashing the long URL, which can produce collisions and requires extra checks.

4. Database Design

Given the massive read throughput and scalability requirements, a NoSQL database like Apache Cassandra or Amazon DynamoDB is the ideal choice.

Why NoSQL?: The primary access pattern is a simple key-value lookup (short_key -> long_url). NoSQL databases excel at horizontal scaling, high availability, and low-latency reads for this pattern, which is perfect for our 100:1 read-to-write ratio.
SQL Trade-off: A sharded SQL database (e.g., PostgreSQL with Citus) could work but would introduce more operational complexity for sharding and scaling at this level.

Core Table Schema (Cassandra/DynamoDB-like):

Table Name: urls
Primary Key: short_key (This will serve as the partition key, ensuring data is evenly distributed across the cluster).
Columns:
- short_key (text, Partition Key)
- long_url (text)
- user_id (uuid)
- created_at (timestamp)
- expires_at (timestamp) - We can set a Time-To-Live (TTL) on the row based on this value for automatic cleanup.

5. Scalability and Performance

Capacity Estimations:
- Writes: 100M URLs/month ≈ 40 writes/sec (avg).
- Reads: 10B reads/month ≈ 3,850 reads/sec (avg). Peak traffic could be 10-20x this amount.
- Storage: 100M URLs/month * 12 months * 5 years = 6B URLs. At ~500 bytes per entry, this is ~3 TB of data.
Scaling the Read Path:
- Caching: This is the most critical component. We will use a multi-layered cache.
  - CDN Cache: Use a CDN (e.g., Cloudflare) to cache redirects at the edge. This can offload a significant percentage of traffic for popular links.
  - Distributed Cache: An in-memory cache like Redis will be used. Given 10B reads/month, many URLs will be 'hot'. Caching even 10-20% of the most active URLs can reduce database read load by over 80-90%.
- Database Sharding: Our choice of NoSQL with short_key as the partition key handles this naturally. We can add more nodes to the database cluster to scale both storage and throughput horizontally.

6. Reliability and Availability

Redundancy: All components will be deployed across multiple Availability Zones (AZs).
- Load Balancers, API Servers: Deployed in an auto-scaling group across at least two AZs.
- Cache (Redis): Deployed in a cluster configuration with replicas in different AZs.
- Database (Cassandra): The database will be configured with a replication factor of 3, with replicas stored in different AZs to ensure data durability and availability during an AZ failure.
Failover: Automated failover mechanisms will be in place. The load balancer will automatically route traffic away from unhealthy instances. Managed database services typically handle database failover automatically.
Data Backup: Regular snapshots of the database will be taken and stored in a separate region for disaster recovery.

7. Rate Limiting and Abuse Prevention

Rate Limiting: Implemented at the API Gateway or in the application logic using an in-memory store like Redis.
- Limits will be based on IP address for anonymous users (e.g., 10 creations/minute) and on API key for registered users (with different tiers).
Abuse Prevention:
- Malicious URL Scanning: Integrate with a service like the Google Safe Browsing API to check every submitted long_url against a blocklist of malicious sites.
- CAPTCHA: For anonymous users exhibiting high-frequency submission patterns.
- Blacklisting: Maintain an internal blacklist of domains and user accounts known for spamming.

Result

Winning Votes

0 / 3

Average Score

Judge Models Google Gemini 2.5 Flash

Total Score

Overall Comments

The design for the URL shortening service is exceptionally well-structured and comprehensive. It addresses all prompt requirements with significant technical depth and practical considerations. Strengths include the clear architecture with dedicated services like the Key Generation Service, a robust URL encoding strategy that avoids collisions, and a well-justified choice of NoSQL for the primary data store. The capacity estimations are accurate, and the scalability and reliability measures, including multi-layered caching and multi-AZ deployment, are very strong. The plan is highly practical and provides a solid foundation for implementation.

View Score Details ▼

Architecture Quality

Weight 30%

The high-level architecture is well-defined, with a logical separation of concerns into components like the Load Balancer, Web/API Servers, Key Generation Service (KGS), Cache, Database, and Analytics Pipeline. The interaction flow for redirection is clearly explained, demonstrating a robust design for handling high read traffic efficiently. The inclusion of a dedicated KGS is a strong architectural decision.

Completeness

Weight 20%

All seven aspects of the prompt are addressed thoroughly. From functional requirements, including extended features like custom URLs and analytics, to detailed sections on database design, scalability, reliability, and abuse prevention, the answer leaves no major stone unturned. The depth of detail in each section is commendable.

Trade-off Reasoning

Weight 20%

The answer provides strong justifications for its design choices. The rationale for using Base62 encoding with a unique ID generator over hashing is well-explained, highlighting the benefit of avoiding collisions. The choice of a NoSQL database (Cassandra/DynamoDB) over sharded SQL is clearly articulated based on the read-heavy access pattern and horizontal scalability needs. The discussion on multi-layered caching also demonstrates good trade-off understanding.

Scalability & Reliability

Weight 20%

The capacity estimations are accurate and well-calculated for both writes and reads, as well as storage. The proposed strategies for scalability, such as multi-layered caching (CDN and Redis), and database sharding with a NoSQL solution, are excellent. Reliability is also thoroughly covered with redundancy across Availability Zones, automated failover mechanisms, and regular data backups, making the service highly available.

Clarity

Weight 10%

The plan is exceptionally clear and well-structured, using distinct headings and bullet points for readability. The language is precise, and the technical concepts are explained in an easy-to-understand manner. Specific technology recommendations (e.g., Redis, Cassandra, Kafka) with brief justifications add to the clarity and practicality of the design, making it a very usable starting point for implementation.

Judge Models Anthropic Claude Opus 4.6

Total Score

Overall Comments

This is a strong, well-structured answer that addresses all seven aspects of the prompt with meaningful detail. It demonstrates solid understanding of real-world system design trade-offs, includes concrete technology recommendations, provides capacity estimations, and presents a realistic, implementable design. The main weaknesses are that the capacity estimation section could be more thorough (e.g., bandwidth calculations are missing), the 301 vs 302 redirect trade-off for analytics is not discussed, and the KGS design could elaborate more on how the counter is distributed across multiple instances to avoid being a single point of failure. Overall, this is a plan a senior engineer could genuinely use as a starting point.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is well-designed with clear separation of read and write paths, a dedicated Key Generation Service, an analytics pipeline using Kafka, and multi-layered caching including CDN. The interaction flow is clearly described. The choice of a counter-based Base62 encoding with a KGS is a strong design decision that eliminates collisions. Minor weakness: the KGS itself could be a single point of failure and the answer does not deeply discuss how the KGS is made highly available or how multiple KGS instances coordinate (e.g., range-based allocation). The analytics pipeline with Kafka and ClickHouse is a practical and well-chosen addition.

Completeness

Weight 20%

All seven aspects from the prompt are addressed with meaningful detail, not just mentioned in passing. Functional requirements cover core and extended features. The architecture, encoding strategy, database design, scalability, reliability, and abuse prevention sections are all substantive. Capacity estimations are included. However, bandwidth estimation is missing, and the answer does not discuss the trade-off between 301 (permanent) and 302 (temporary) redirects, which is important for analytics since 301 redirects may be cached by browsers and bypass the server, undermining click tracking.

Trade-off Reasoning

Weight 20%

The answer discusses several important trade-offs: counter-based ID generation vs hashing (with clear justification for the counter approach), NoSQL vs SQL (with reasoning about access patterns and horizontal scaling), and caching strategies. The TTL feature for expiration in Cassandra is a practical insight. However, some trade-offs are mentioned but not deeply explored. For example, the SQL trade-off is dismissed somewhat quickly. The 301 vs 302 redirect trade-off is not discussed at all, which is a notable omission given that the answer specifies 301 redirects while also requiring analytics. Consistency model trade-offs in Cassandra (eventual vs strong consistency) are not discussed.

Scalability & Reliability

Weight 20%

The scalability section is solid with multi-layered caching (CDN + Redis), database sharding via partition keys, and clear capacity estimations for writes, reads, and storage. The reliability section covers multi-AZ deployment, replication factor of 3, auto-scaling groups, and disaster recovery backups. The estimate of 80-90% cache hit rate reducing database load is realistic. Peak traffic consideration (10-20x average) shows practical thinking. Minor gaps: no discussion of read replicas explicitly, no bandwidth estimation, and the KGS availability/redundancy is not addressed in the reliability section despite being a critical component.

Clarity

Weight 10%

The answer is exceptionally well-organized with clear headings, numbered sections, and bullet points that make it easy to follow. The interaction flow for redirects is described concisely. The schema is presented in a clear, readable format. Technology choices are named specifically (Redis, Cassandra, DynamoDB, Kafka, ClickHouse, Cloudflare). The writing is concise and professional throughout, making this very suitable as a starting point for implementation.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

This is a solid and practical system design that covers all major areas of the prompt with good structure and realistic component choices. The architecture is coherent, the read/write paths are understandable, and the use of cache, async analytics, and multi-AZ deployment is appropriate. The strongest parts are the high-level architecture, encoding approach, and practical abuse prevention ideas. The main weaknesses are limited depth in some trade-offs and incomplete capacity analysis: the math is only partial, storage assumptions are not broken down, peak estimates are rough, and important details like redirect semantics, hot key handling, custom alias uniqueness, consistency choices, read replicas, and failure modes of the key generation service are not fully explored.

View Score Details ▼

Architecture Quality

Weight 30%

The proposed architecture is strong and implementable, with clear separation of API layer, cache, database, key generation, and analytics pipeline. The redirect flow is sensible and the use of Kafka plus an analytics store is a good production-oriented choice. However, some architectural details that matter at scale are missing, such as how custom aliases are reserved atomically, how the key generation service avoids becoming a single point of failure, whether redirects should be served directly from edge or app tier, and how expired or deleted links are invalidated from cache.

Completeness

Weight 20%

The answer addresses all seven required aspects and includes functional requirements, architecture, encoding, database choice, scalability, reliability, and abuse prevention. It also includes capacity estimates and a basic schema. Completeness is reduced slightly because analytics storage schema, user/account data design, expiration behavior on redirect, and more explicit handling of collisions or alias conflicts for custom URLs are not fully detailed.

Trade-off Reasoning

Weight 20%

There is decent reasoning behind choosing NoSQL for key-value lookups and counter plus Base62 over hashing. The answer correctly notes the collision trade-off versus hashing and highlights operational complexity for sharded SQL. Still, the trade-off discussion remains somewhat shallow: it does not discuss consistency requirements, write amplification from replication, pros and cons of pre-generated keys versus on-demand ID allocation, the limits of TTL behavior in different databases, or the practical pros and cons of SQL for transactional features like user ownership and custom aliases.

Scalability & Reliability

Weight 20%

The design shows good awareness of scaling reads with Redis and CDN, horizontal scaling in NoSQL, replication across AZs, backups, and automated failover. The rough throughput numbers are directionally correct and peak traffic is acknowledged. The score is held back because the capacity section is not fully developed: it lacks bandwidth calculations, cache sizing logic, replication-adjusted storage estimates, and more rigorous QPS planning. Reliability discussion also does not cover regional failover strategy, hot partition risks, or detailed failure handling for Redis and the key generation path.

Clarity

Weight 10%

The answer is well organized, easy to follow, and structured directly around the prompt sections. The component list and redirect flow are especially clear, making the design easy for a senior engineer to review. Minor deductions because some sections are concise where more explicit justification or schema detail would improve precision.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Winner OpenAI GPT-5.2

Winning Votes

3 / 3

Average Score

View this answer

Google Gemini 2.5 Pro

Winning Votes

0 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models OpenAI GPT-5.4

GPT-5.2 Winner 92

Gemini 2.5 Pro 84

Judge Models Anthropic Claude Opus 4.6

GPT-5.2 Winner 92

Gemini 2.5 Pro 83

Judge Models Google Gemini 2.5 Flash

GPT-5.2 Winner 96

Gemini 2.5 Pro 94

Design a URL Shortening Service

Task Overview

Task Prompt

Answers

Answer A: OpenAI GPT-5.2

Answer B: Google Gemini 2.5 Pro

1. Functional Requirements

1. Functional Requirements

2. High-Level Architecture

3. URL Encoding Strategy

4. Database Design

5. Scalability and Performance

6. Reliability and Availability

7. Rate Limiting and Abuse Prevention

Comparison Summary

Judging Results

Related Tasks

Innovative Uses for Retired Electric Vehicle Batteries

Explain the Mechanism and Consequences of Chromosomal Nondisjunction

Summarize a Passage on the History and Science of Urban Heat Islands

Evaluating Evidence in a Product Recall Decision

Write a Client-Facing Email Explaining a Significant Project Delay

Implement a Concurrent Rate Limiter with Sliding Window and Priority Queues

Emergency Shelter Setup Plan for a Sudden Flood Event

Draft a Persuasive Internal Proposal to Adopt a Four-Day Work Week

Related Links