Answer A: OpenAI GPT-5.2
1) Functional Requirements Core (MVP) - Create short URL - Input: long_url, optional custom_alias, optional expiration, optional user_id/api_key - Output: short_url, key, created_at, expiry_at - Redirect - GET /{key} -> 301/302 to long_url - Honor expiration/disabled state - Basic analytics - Count redirects per key (total + time buckets) - Optional: referrer, country/region (coarse), user-agent family, device type - Link management - Retrieve metadata (long_url, created_at, expiry_at, status) - Disable/delete (soft delete) Nice-to-have - Custom domains per customer - Bulk shortening API - QR code...
Show Full Answer ▼
1) Functional Requirements Core (MVP) - Create short URL - Input: long_url, optional custom_alias, optional expiration, optional user_id/api_key - Output: short_url, key, created_at, expiry_at - Redirect - GET /{key} -> 301/302 to long_url - Honor expiration/disabled state - Basic analytics - Count redirects per key (total + time buckets) - Optional: referrer, country/region (coarse), user-agent family, device type - Link management - Retrieve metadata (long_url, created_at, expiry_at, status) - Disable/delete (soft delete) Nice-to-have - Custom domains per customer - Bulk shortening API - QR code generation - Anti-malware / safe browsing checks - A/B routing, UTM templates Non-functional - Very low redirect latency (p95 < ~20–50ms from edge/cache) - High availability (multi-AZ/region) - Strong consistency not required for analytics, but required for key->URL mapping 2) High-Level Architecture Traffic flow - DNS -> CDN/Edge (optional but recommended) - Global load balancer (GSLB) -> Regional L7 load balancer - API Gateway - Auth (API keys/OAuth), throttling, request validation - Application services (stateless) - Shorten Service (writes) - Redirect Service (reads, extremely hot path) - Analytics Ingest Service (async) Data layer - Primary Key-Value store for mapping key -> destination record - Cache layer (Redis/Memcached) for hot key lookups - Analytics pipeline - Redirect Service emits event to a log/queue (Kafka/PubSub/Kinesis) - Stream processor aggregates into OLAP store (ClickHouse/BigQuery/Druid) and/or time-series (Cassandra/Scylla) - Periodic rollups for dashboards Supporting services - Key generation service (if using pre-generated IDs) - Abuse detection service (URL reputation, user behavior) - Observability: metrics, tracing, logs Interaction - Create: - Client -> API Gateway -> Shorten Service - Validate URL, check abuse, optional custom alias uniqueness - Obtain unique key (encoding strategy below) - Write mapping to DB - Invalidate/prime cache - Redirect: - Client -> CDN/Edge -> Redirect Service - Lookup key in cache; on miss, query DB - If found and not expired/disabled: respond 301/302 - Emit async analytics event 3) URL Encoding Strategy Goals: uniqueness, short length, high throughput, no central bottleneck. Recommended: numeric ID + Base62 - Use a monotonically increasing 64-bit ID (or time-ordered ID) and encode in Base62 (0-9a-zA-Z). - For 100M new URLs/month (~3.86k writes/sec average; higher peak), ID generation must support > tens of thousands/sec. Options: A) Database sequence (simple) - Pros: easy, strongly unique - Cons: can be bottleneck and hard across shards; requires coordination B) Distributed ID (Snowflake-like) (recommended) - 64-bit: timestamp + region/node + sequence - Pros: scalable, no single writer - Cons: slightly longer keys if you encode full 64-bit; still compact in Base62 (up to 11 chars) C) Pre-generated key pool - Background job generates random Base62 strings, stores unused pool; app reserves keys. - Pros: decouples from ordering, can keep keys short - Cons: pool management complexity Collision handling - For ID-based approach: no collisions by construction. - For custom aliases or random keys: enforce uniqueness with conditional put/unique constraint; on collision, retry with new key. Key length - Base62 length needed: 100M/month implies ~1.2B/year. Base62^7 ≈ 3.5T so 7 chars is plenty if using sequential IDs; Snowflake IDs may be 10–11 chars but acceptable. 4) Database Design Primary store requirements - Very high read QPS, key-based lookups, small records, low latency. - Strongly consistent writes for key uniqueness; reads can be eventually consistent if cache is correct, but prefer consistent read-after-write for new links. Recommended: DynamoDB / Cassandra / ScyllaDB (NoSQL KV) OR MySQL/Postgres with sharding. - NoSQL KV pros: horizontal scale, high throughput, predictable latency. - SQL pros: constraints, transactions, simpler for custom alias uniqueness and admin queries; but sharding/replicas become more complex at scale. Pragmatic choice - Mapping store: DynamoDB (or Cassandra/Scylla) as system of record. - Optional relational store for user/account/billing. Core schema (KV / wide-column) Table: url_mapping - key (partition key, string) - long_url (string) - created_at (timestamp) - expiry_at (timestamp, nullable) - status (active|disabled|deleted) - user_id (string/uuid, nullable) - custom_alias (bool) - domain (string, default) - last_accessed_at (timestamp, nullable) - redirect_code (int: 301/302) Indexes / access patterns - Primary: key -> record - By user (for management UI): secondary index - GSI: user_id as partition key, created_at as sort key (or reverse) - By long_url (optional dedupe): hash(long_url) index (only if you want “same long URL returns same key” behavior) Analytics storage (separate) - Raw events in object storage (S3/GCS) + streaming aggregate into OLAP. - Aggregated table example (ClickHouse): (key, day/hour, redirects, unique_ips_approx, country, referrer_domain, ua_family) SQL vs NoSQL trade-off summary - SQL: easier uniqueness for custom aliases, ad-hoc queries; harder to scale writes/reads without careful sharding. - NoSQL: best for primary lookup workload; must design access patterns upfront; uniqueness for custom aliases handled via conditional writes. 5) Scalability and Performance Traffic estimates - Writes: 100M/month ≈ 3.86k/s average, plan for 10x peak => ~40k/s. - Reads: 100:1 => 386k/s average redirects, plan 10x peak => ~4M/s peak globally. Storage - 100M/month * 12 = 1.2B mappings/year. - Record size (key ~10B, URL avg 200B, metadata): assume ~500B–1KB. - 1.2B * 1KB ≈ 1.2TB/year (plus replication and indexes). Caching - Redis/Memcached cluster per region. - Cache key: short key; value: long_url + status + expiry_at + redirect_code. - TTL strategy: - For non-expiring links: long TTL (e.g., 1–7 days) with refresh-on-access. - For expiring links: TTL aligned with expiry. - Negative caching for missing/disabled keys (short TTL) to reduce DB hits. - CDN/Edge caching for redirects where safe: - Cache 301 for public, non-expiring links; careful with per-user or dynamic redirects. Sharding/partitioning - NoSQL: partition by key; ensure uniform distribution. - If SQL: shard by key hash; maintain routing layer. Read replicas - If using SQL or a replicated KV store: add read replicas for management/read-heavy non-redirect queries. Hot keys - Extremely popular short URLs can overload cache nodes. - Use consistent hashing with sufficient virtual nodes. - Consider in-process LRU cache in redirect service. - Edge caching at CDN reduces origin load. Write path optimization - Batch analytics events; never block redirect on analytics. 6) Reliability and Availability Multi-AZ - Run API/Redirect services across multiple AZs behind load balancer. - Cache: Redis cluster with replication + automatic failover (or managed Redis). - DB: multi-AZ replication; quorum reads/writes as appropriate. Multi-region (recommended for global service) - Active-active redirects: replicate mapping DB cross-region (DynamoDB global tables / Cassandra multi-DC). - Writes can be routed to nearest region; resolve conflicts: - For ID-based keys, collisions are unlikely; custom aliases require global uniqueness—handle by routing custom alias creation to a “home region” per domain or using strongly consistent global coordination (rare path). Failover - Health checks + automated traffic shifting via GSLB. - Stateless services allow fast scaling and replacement. Backups and DR - Continuous backups/snapshots of mapping store. - Store raw analytics logs in durable object storage. Graceful degradation - If analytics pipeline is down, continue redirects and buffer events (queue retention) or sample. - If cache is down, redirect service falls back to DB (expect latency increase, but service remains functional). 7) Rate Limiting and Abuse Prevention Rate limiting - Per API key/user/IP limits for creation endpoints (token bucket/leaky bucket at API Gateway). - Separate, higher limits for redirects; protect from floods with CDN/WAF. Abuse controls - URL validation: allow-list schemes (http/https), max length, block internal IP ranges (SSRF-style) for preview/scanning components. - Malware/phishing detection: - Integrate with Safe Browsing/reputation feeds. - Asynchronous scanning; optionally “pending” state until scan passes for untrusted users. - Custom alias policies: reserved words, profanity filter, trademark policies. - Bot detection: device fingerprinting heuristics, CAPTCHA for suspicious creation traffic. - Redirect protection: - WAF rules, anomaly detection on redirect patterns. - Per-key redirect rate alerts; temporary throttling. Capacity/Throughput Planning Summary (rough) - Peak redirects: target a few million requests/sec globally. - CDN offload + regional caches should serve majority (aim >95% cache hit for hot set). - Origin DB reads should be minimized; design for <5% misses. - Writes: tens of thousands/sec at peak; distributed ID generation avoids central bottleneck. - Data growth: ~1.2B mappings/year; plan multi-TB storage with replication and indexing overhead. Implementation starting point (component list) - API Gateway + Auth - Shorten Service - Redirect Service (optimized, minimal dependencies) - Redis cache cluster per region - Mapping DB (DynamoDB/Cassandra/Scylla) - ID generator (Snowflake) library/service - Kafka/Kinesis + stream processors + OLAP store - Admin/management service + user DB - Monitoring/alerting + WAF/CDN
Result
Winning Votes
3 / 3
Average Score
Total Score
Overall Comments
The design for the URL shortening service is exceptionally comprehensive, well-structured, and technically sound. It addresses all prompt requirements with significant depth, offering practical solutions and justified trade-offs. Strengths include detailed architectural components, a robust URL encoding strategy, thoughtful database design with schema, and extensive coverage of scalability, reliability, and abuse prevention. The capacity estimations are integrated effectively. The plan is clear, concise, and provides a solid foundation for implementation, demonstrating an excellent understanding of distributed systems design.
View Score Details ▼
Architecture Quality
Weight 30%The high-level architecture is very well-defined, delineating clear components such as API Gateway, separate services for writes and reads (Shorten, Redirect), and an asynchronous analytics pipeline. The proposed data layer with a primary KV store, cache, and OLAP for analytics is appropriate for the workload. The interaction flows for create and redirect operations are precisely described, highlighting the critical role of caching for the hot redirect path and considering global distribution.
Completeness
Weight 20%The answer provides a complete and detailed response to all seven aspects of the prompt. It covers functional and non-functional requirements, a comprehensive high-level architecture, a well-reasoned URL encoding strategy, detailed database design with schema and trade-offs, robust scalability and reliability mechanisms, and practical abuse prevention strategies. The inclusion of rough capacity estimations and an implementation starting point further enhances its completeness.
Trade-off Reasoning
Weight 20%The answer demonstrates strong reasoning for various technical trade-offs. It clearly discusses the pros and cons of different URL encoding strategies (DB sequence vs. distributed ID vs. pre-generated pool) and justifies the choice of numeric ID + Base62. The detailed comparison between SQL and NoSQL for the primary data store, including their respective challenges for scaling and unique constraints, is excellent. Cache TTL strategies and multi-region conflict resolution are also well-considered.
Scalability & Reliability
Weight 20%Scalability is thoroughly addressed through detailed traffic estimates, comprehensive caching strategies (Redis, CDN, negative caching), sharding/partitioning, and hot key management. Reliability is equally well-covered with multi-AZ and multi-region deployments, robust replication, failover mechanisms, continuous backups, and strategies for graceful degradation. The proposed solutions are practical and robust, ensuring high availability and performance under heavy load.
Clarity
Weight 10%The plan is exceptionally clear, well-structured, and easy to follow. The use of clear headings, subheadings, and bullet points makes the content highly digestible. The language is precise and technical, suitable for a senior engineer. Specific technology recommendations (e.g., DynamoDB, Cassandra, Snowflake, Redis, ClickHouse) are provided with context, further enhancing the clarity and practicality of the design.
Total Score
Overall Comments
This is an excellent, comprehensive system design answer that addresses all seven required aspects with meaningful depth. It includes concrete capacity estimations, specific technology recommendations with rationale, detailed schema definitions, and thorough discussion of trade-offs. The answer is well-structured with clear sections, covers edge cases like hot keys and graceful degradation, and provides practical implementation guidance. Minor areas for improvement include slightly more detailed back-of-the-envelope math for bandwidth and a text-described architecture diagram, but overall this is a very strong response suitable as a senior engineer's starting point.
View Score Details ▼
Architecture Quality
Weight 30%The architecture is well-designed with clear separation of concerns: stateless application services, dedicated read/write paths, async analytics pipeline via Kafka, caching layer, and CDN/edge. The interaction flows for both create and redirect are clearly described. The choice of Snowflake-like distributed ID generation is well-justified. The multi-region active-active design with DynamoDB global tables or Cassandra multi-DC is practical. The only minor gap is the lack of a text-based diagram, though the textual description of the flow is quite clear.
Completeness
Weight 20%All seven aspects from the prompt are thoroughly addressed. Functional requirements include both core and nice-to-have features. The URL encoding strategy covers multiple approaches with pros/cons. Database design includes schema, access patterns, and indexes. Scalability covers caching, sharding, hot keys, and CDN. Reliability covers multi-AZ, multi-region, failover, backups, and graceful degradation. Rate limiting and abuse prevention are detailed. Capacity estimations are included with writes/sec, reads/sec, and storage calculations. The answer also includes non-functional requirements and an implementation component list.
Trade-off Reasoning
Weight 20%Strong trade-off analysis throughout. SQL vs NoSQL is discussed with specific pros and cons for this use case. Three ID generation approaches are compared with clear reasoning for recommending Snowflake-like IDs. Cache TTL strategies differentiate between expiring and non-expiring links. The answer discusses 301 vs 302 redirect codes, consistency models for different data types, and the trade-off between custom alias global uniqueness and write routing. The discussion of negative caching and hot key mitigation shows real-world awareness. Could have gone slightly deeper on consistency guarantees during cross-region replication conflicts.
Scalability & Reliability
Weight 20%Excellent coverage of scalability with concrete numbers: 3.86k writes/sec average, 386k reads/sec average, 10x peak planning, 1.2TB/year storage estimate. Caching strategy is well-thought-out with CDN, regional Redis clusters, in-process LRU, and negative caching. Hot key handling is addressed. Reliability section covers multi-AZ, multi-region, automated failover, graceful degradation when analytics or cache fails, and continuous backups. The 95% cache hit target is realistic. Could have included more specific bandwidth calculations and latency budget breakdowns.
Clarity
Weight 10%The answer is exceptionally well-organized with clear section headers matching the prompt's seven aspects. Bullet points and sub-sections make it easy to scan. Technical terms are used precisely. The flow from functional requirements through architecture to implementation details is logical. The capacity summary at the end ties everything together. The component list at the end provides a practical implementation starting point. Very readable and actionable for a senior engineer.
Total Score
Overall Comments
This is a strong, practical system design answer that covers all major areas requested by the prompt and is organized in a way that a senior engineer could build from. It does especially well on architecture, key generation, database choices, caching, multi-region reliability, and abuse prevention. The capacity section includes useful back-of-the-envelope estimates, though some math and assumptions are rough and could be expanded further with bandwidth, cache sizing, and more explicit daily or regional breakdowns. Trade-offs are discussed well, but a few choices remain somewhat broad rather than fully pinned down to one concrete implementation path.
View Score Details ▼
Architecture Quality
Weight 30%The architecture is well-structured and realistic, with clear separation between API gateway, shorten service, redirect service, cache, primary mapping store, analytics pipeline, abuse detection, and observability. The redirect path is appropriately optimized and analytics are decoupled asynchronously, which is an important real-world design choice. Multi-AZ and multi-region concerns are addressed sensibly. A slightly higher score would require a more opinionated final architecture choice instead of listing several equivalent datastore options.
Completeness
Weight 20%The answer addresses all seven required aspects in meaningful detail: functional requirements, high-level architecture, encoding strategy, database design, scalability and performance, reliability and availability, and rate limiting and abuse prevention. It also includes the requested capacity estimation and implementation starting point. Minor gaps include limited discussion of exact expiration enforcement mechanics and only a brief mention of redirect status code semantics.
Trade-off Reasoning
Weight 20%The response demonstrates solid understanding of trade-offs, especially around ID generation approaches, SQL versus NoSQL, cache TTLs, analytics consistency, CDN caching, and custom alias uniqueness in multi-region setups. The reasoning is practical and reflects real system concerns. It loses some points because several sections present multiple technology choices without fully narrowing to a single preferred design and its consequences.
Scalability & Reliability
Weight 20%Scalability and availability are handled well, with discussion of cache-first reads, hot-key mitigation, partitioning, replication, failover, queue-based analytics, and graceful degradation. The answer correctly prioritizes keeping redirects available even when analytics or cache components fail. Capacity planning is directionally good, but it could be stronger with more detailed QPS derivation, bandwidth estimates, cache hit assumptions translated into backend load, and storage overhead beyond the base record estimate.
Clarity
Weight 10%The answer is very clear, logically organized, and easy to scan. Headings map directly to the prompt, bullets are concise but informative, and the final implementation checklist is useful. It reads like a practical engineering plan rather than a vague essay. The only minor issue is that a few sections are dense with options, which slightly reduces decisiveness.