Orivel Orivel
Open menu

Design a Global URL Shortening Service

Compare model answers for this System Design benchmark and review scores, judging comments, and related examples.

Login or register to use likes and favorites. Register

X f L

Contents

Task Overview

Benchmark Genres

System Design

Task Creator Model

Answering Models

Judge Models

Task Prompt

Design a public URL shortening service similar to Bitly. Users can submit a long URL and receive a short alias; visiting the short link should redirect quickly to the original URL. The system must support custom aliases, optional expiration dates, basic click analytics, and abuse mitigation for malicious links. Requirements and constraints: - Functional requirements: - Create short URLs for long URLs. - Redirect short URLs to original URLs. - Support custom aliases when available. - Support optional expira...

Show more

Design a public URL shortening service similar to Bitly. Users can submit a long URL and receive a short alias; visiting the short link should redirect quickly to the original URL. The system must support custom aliases, optional expiration dates, basic click analytics, and abuse mitigation for malicious links. Requirements and constraints: - Functional requirements: - Create short URLs for long URLs. - Redirect short URLs to original URLs. - Support custom aliases when available. - Support optional expiration time per link. - Record click events for analytics. - Allow users to disable a link manually. - Scale assumptions: - 120 million new short URLs per month. - 1.5 billion redirects per day. - Redirect traffic is globally distributed and read-heavy. - Analytics data should be queryable within 15 minutes. - Performance targets: - Redirect p95 latency under 80 ms for most regions. - Short-link creation p95 under 300 ms. - 99.99% availability for redirects. - Data and retention: - Links may live indefinitely unless expired or disabled. - Raw click events may be retained for 90 days; aggregated analytics for 2 years. - Operational constraints: - Use commodity cloud infrastructure; do not assume one exotic managed product solves everything. - Budget matters: justify any replication, caching, and storage choices. - Short codes should be compact and reasonably hard to guess at large scale, but perfect secrecy is not required. In your answer, provide: 1. A high-level architecture with major components and data flow. 2. Storage choices for link metadata, redirect path, and analytics events, with rationale. 3. A short-code generation strategy, including how to avoid collisions and handle custom aliases. 4. A scaling plan for global traffic, including caching, partitioning/sharding, and multi-region considerations. 5. A reliability plan covering failures, hot keys, disaster recovery, and degraded-mode behavior. 6. Key APIs and core data models. 7. Abuse mitigation and security considerations. 8. The main trade-offs you made and why.

Judging Policy

A strong answer should propose a coherent end-to-end architecture that clearly separates write, read, and analytics concerns; match the design to the stated traffic and latency goals; and justify storage, caching, replication, and partitioning choices in practical terms. It should address global read-heavy behavior, collision-resistant code generation, custom alias handling, expiration and disablement, eventual analytics freshness, and realistic multi-region reliability. Good answers discuss failure modes such as c...

Show more

A strong answer should propose a coherent end-to-end architecture that clearly separates write, read, and analytics concerns; match the design to the stated traffic and latency goals; and justify storage, caching, replication, and partitioning choices in practical terms. It should address global read-heavy behavior, collision-resistant code generation, custom alias handling, expiration and disablement, eventual analytics freshness, and realistic multi-region reliability. Good answers discuss failure modes such as cache loss, regional outages, datastore issues, and hot links, and explain graceful degradation. They should include sensible APIs and data models, abuse prevention measures, and explicit trade-offs on consistency, cost, latency, and operational complexity. Answers that stay purely generic, ignore constraints, or omit trade-off reasoning should score lower.

Answers

1. High-level architecture with major components and data flow: * **User/Client:** Interacts with the service. * **API Gateway/Load Balancer:** Entry point for all requests, handles routing, rate limiting, and SSL termination. * **Shorten Service:** Microservice responsible for creating, updating, and managing short URLs. Interacts with Link Metadata DB and Cache. * **Redirect Service:** Microservice optimized for high-throughput, low-latency redirects. Primarily reads from Cache, falls back to Link Metadata DB. Publishes click events to a Message Queue. * **Distributed Cache (e.g., Redis Cluster):** S...

Show Full Answer

1. High-level architecture with major components and data flow: * **User/Client:** Interacts with the service. * **API Gateway/Load Balancer:** Entry point for all requests, handles routing, rate limiting, and SSL termination. * **Shorten Service:** Microservice responsible for creating, updating, and managing short URLs. Interacts with Link Metadata DB and Cache. * **Redirect Service:** Microservice optimized for high-throughput, low-latency redirects. Primarily reads from Cache, falls back to Link Metadata DB. Publishes click events to a Message Queue. * **Distributed Cache (e.g., Redis Cluster):** Stores hot short_code to long_url mappings for rapid redirects. * **Link Metadata Database (e.g., Cassandra/DynamoDB):** Stores all short URL metadata (short_code, long_url, expiration, status, etc.). * **Message Queue (e.g., Kafka/Kinesis):** Ingests raw click events from the Redirect Service for asynchronous processing. * **Analytics Processor:** Consumes raw click events from the Message Queue, performs aggregation, and stores them in the Analytics Database. * **Analytics Database (e.g., ClickHouse/Redshift):** Stores aggregated click data for querying. * **Data Lake (e.g., S3):** Stores raw click events for long-term retention and deeper analysis. * **Abuse Detection Service:** Integrates with Shorten Service to scan new URLs for malicious content. * **Monitoring & Alerting:** Observability for all components. **Data Flow:** * **Shorten Request:** User -> API Gateway -> Shorten Service -> Abuse Detection -> Link Metadata DB (write) -> Cache (write). * **Redirect Request:** User -> CDN (optional) -> Load Balancer -> Redirect Service -> Cache (read) -> Link Metadata DB (fallback read) -> Message Queue (write click event) -> Redirect to Long URL. * **Analytics Processing:** Message Queue -> Analytics Processor -> Analytics DB (write aggregated) / Data Lake (write raw). 2. Storage choices for link metadata, redirect path, and analytics events, with rationale: * **Link Metadata (Short URL -> Long URL, Expiration, Status, etc.):** * **Choice:** Globally distributed NoSQL database (e.g., Apache Cassandra or AWS DynamoDB Global Tables). * **Rationale:** Handles high read/write throughput (1.5B reads/day, 120M writes/month), offers low-latency access from multiple regions, provides high availability, and scales horizontally. The primary key would be the `short_code` for efficient lookups. * **Redirect Path (Short Code -> Long URL mapping for fast lookup):** * **Choice:** Distributed in-memory cache (e.g., Redis Cluster). * **Rationale:** Crucial for achieving p95 latency under 80ms for redirects. Reduces load on the primary database significantly. Hot links are aggressively cached with appropriate TTLs (e.g., based on link expiration or LRU policy). Replicated across regions for local access. * **Analytics Events (Raw Clicks):** * **Choice:** Message Queue (e.g., Apache Kafka or AWS Kinesis) for ingestion, followed by a Data Lake (e.g., AWS S3) for storage. * **Rationale:** Kafka/Kinesis handles the immense write volume (1.5B events/day) by decoupling the redirect path from analytics processing, ensuring redirects remain fast. S3 provides cost-effective, highly durable storage for raw events retained for 90 days, suitable for batch processing and historical analysis. * **Aggregated Analytics:** * **Choice:** Columnar analytical database (e.g., ClickHouse or AWS Redshift). * **Rationale:** Optimized for complex analytical queries and aggregations over large datasets. Allows for fast querying of aggregated data (e.g., daily clicks, browser distribution) within 15 minutes, retained for 2 years, without impacting the operational database. 3. Short-code generation strategy, including how to avoid collisions and handle custom aliases: * **Short-code Generation Strategy:** 1. **Distributed ID Generation:** Use a distributed, unique ID generator (e.g., a custom service generating Snowflake-like IDs or UUID v7) to produce a globally unique, monotonically increasing 64-bit integer ID. 2. **Base62 Encoding:** Encode this unique integer ID into a compact Base62 string (0-9, a-z, A-Z). A 64-bit ID can produce a short code of 6-10 characters, offering a vast namespace (e.g., 6 characters provide 62^6 ≈ 56 billion unique codes, sufficient for 120M/month for many years). * **Collision Avoidance:** * **ID-based:** Since the underlying ID is guaranteed unique, the Base62 encoded short code will also be unique, inherently avoiding collisions for system-generated codes. * **Random Fallback (for robustness):** As a secondary option or for specific use cases, a random string generator could be used. In this case, generate a candidate short code, then perform a quick lookup in the Link Metadata DB and Cache. If a collision is detected, regenerate and retry a few times. This is less efficient but provides a fallback. * **Custom Aliases:** 1. **User Submission:** Users submit their desired `custom_alias` along with the `long_url`. 2. **Validation:** The Shorten Service validates the `custom_alias` (e.g., length, allowed characters, not a reserved keyword, not blacklisted). 3. **Uniqueness Check:** Before creation, the Shorten Service performs a lookup in the Link Metadata DB to check if the `custom_alias` already exists. This check must be strongly consistent. 4. **Reservation:** If the `custom_alias` is available, it's stored directly as the `short_code` in the Link Metadata DB. If unavailable, the request is rejected, prompting the user to choose another. 4. Scaling plan for global traffic, including caching, partitioning/sharding, and multi-region considerations: * **Caching:** * **CDN:** Utilize a Content Delivery Network (CDN) for static assets and potentially for DNS resolution of the short links, directing users to the nearest edge location. * **Distributed Cache (Redis Cluster):** Deploy Redis clusters in each major geographic region. These clusters will store the most frequently accessed `short_code` to `long_url` mappings. Cache entries have TTLs aligned with link expiration or an LRU policy. This significantly offloads the database for the 1.5 billion daily redirects. * **Partitioning/Sharding:** * **Link Metadata Database:** Shard the database by the `short_code` (e.g., using a hash of the short code). This distributes data and query load across multiple database nodes. Each shard is replicated for high availability within a region. * **Analytics Database:** Partition raw click events by time (e.g., daily or hourly partitions) and aggregated data by `short_code` and `date` to optimize query performance and data retention policies. * **Multi-Region Considerations:** * **Active-Active for Redirects:** Deploy the Redirect Service, Distributed Cache, and Link Metadata Database (with global replication) in multiple geographic regions (e.g., North America, Europe, Asia-Pacific). Geo-DNS routes users to the closest region, ensuring low-latency redirects globally. * **Active-Passive/Active-Active for Shorten Service:** The Shorten Service can be deployed active-passive (primary in one region, replicas in others) or active-active, depending on the write consistency requirements and complexity. Writes are less frequent than reads, so a slightly higher latency for creation is acceptable if it simplifies consistency. * **Global Database Replication:** The Link Metadata Database (e.g., DynamoDB Global Tables or Cassandra's multi-datacenter replication) ensures data is replicated across regions, allowing local reads for redirects and providing disaster recovery capabilities. * **Analytics Ingestion:** Regional Message Queues (Kafka/Kinesis) aggregate click events locally, which are then streamed to a central Data Lake/Analytics Database or replicated across regions for consolidated analysis. 5. Reliability plan covering failures, hot keys, disaster recovery, and degraded-mode behavior: * **Failures:** * **Redundancy:** All services (Shorten, Redirect, Analytics Processors) are deployed with N+1 redundancy across multiple availability zones within each region, behind load balancers. * **Database Replication:** Link Metadata DB and Analytics DB use synchronous/asynchronous replication across availability zones and regions to ensure data durability and availability. * **Circuit Breakers & Retries:** Implement circuit breakers and exponential backoff/retry mechanisms in microservices to prevent cascading failures and handle transient issues gracefully. * **Monitoring & Alerting:** Comprehensive monitoring of system health, performance metrics, and error rates with automated alerts for critical issues. * **Hot Keys:** * **Cache Sharding:** The Distributed Cache (Redis Cluster) is sharded, distributing hot keys across multiple nodes to prevent a single node from becoming a bottleneck. * **Cache Warming:** For anticipated hot links (e.g., from major campaigns), pre-load them into the cache. * **Rate Limiting:** Implement rate limiting at the API Gateway and Redirect Service levels to protect backend systems from sudden traffic spikes or abuse targeting specific links. * **Disaster Recovery:** * **Multi-Region Active-Active:** The active-active deployment for the Redirect Service and globally replicated Link Metadata DB provides inherent disaster recovery for redirects. If one region fails, traffic is automatically routed to another healthy region via Geo-DNS. * **Data Backups:** Regular, automated backups of all critical databases (Link Metadata, Aggregated Analytics) to geographically separate, durable storage (e.g., S3). * **Recovery Playbooks:** Documented and regularly tested procedures for failover, data restoration, and full system recovery. * **Degraded-Mode Behavior:** * **Analytics Degradation:** If the Message Queue or Analytics Processor experiences issues, raw click events can be temporarily buffered or, in extreme cases, dropped (with alerts). Redirects must continue to function without interruption. * **Cache Misses/Failure:** If the Distributed Cache fails or experiences high latency, the Redirect Service falls back to querying the Link Metadata Database. This will result in higher redirect latency but ensures service continuity. Circuit breakers prevent overwhelming the database. * **Shorten Service Degradation:** If the Shorten Service is impaired, redirects remain unaffected. Users might experience slower link creation or temporary unavailability of the creation API, but existing links will continue to function. 6. Key APIs and core data models: * **Key APIs:** * **`POST /api/v1/shorten`** * **Description:** Creates a new short URL. * **Request Body:** `{"long_url": "string", "custom_alias": "string (optional)", "expiration_date": "ISO 8601 timestamp (optional)", "user_id": "string (optional)"}` * **Response:** `{"short_url": "string", "long_url": "string", "expires_at": "ISO 8601 timestamp (optional)"}` * **`GET /{short_code}`** * **Description:** Redirects to the original long URL. * **Response:** HTTP 301/302 Redirect to `long_url`. * **`GET /api/v1/links/{short_code}/analytics`** * **Description:** Retrieves click analytics for a specific short URL. * **Response:** `{"short_code": "string", "total_clicks": "integer", "daily_clicks": [{"date": "YYYY-MM-DD", "count": "integer"}], "browser_distribution": {"Chrome": 100, "Firefox": 50}, "country_distribution": {"US": 70, "DE": 30}}` * **`PUT /api/v1/links/{short_code}/status`** * **Description:** Updates the status of a short URL (e.g., disable). * **Request Body:** `{"status": "enum (active, disabled)"}` * **Response:** `{"short_code": "string", "status": "string"}` * **Core Data Models:** * **Link Metadata (Stored in Link Metadata DB):** ``` { "short_code": "string (Primary Key)", "long_url": "string", "user_id": "string (Foreign Key, optional)", "created_at": "timestamp", "expires_at": "timestamp (optional)", "status": "enum (active, disabled, expired)", "is_custom_alias": "boolean", "last_accessed_at": "timestamp (for LRU/cleanup)" } ``` * **Click Event (Raw - Stored in Data Lake, ingested via Message Queue):** ``` { "event_id": "UUID (Primary Key)", "short_code": "string", "timestamp": "timestamp", "ip_address_hash": "string (anonymized/hashed)", "user_agent": "string", "referrer": "string (optional)", "country": "string (derived from IP)", "city": "string (derived from IP)" } ``` * **Aggregated Analytics (Stored in Analytics DB):** ``` { "short_code": "string (Partition Key)", "date": "date (Sort Key)", "total_clicks": "integer", "browser_counts": "map<string, integer>", "os_counts": "map<string, integer>", "country_counts": "map<string, integer>", "referrer_counts": "map<string, integer>" } ``` 7. Abuse mitigation and security considerations: * **Malicious Link Detection:** * **Blacklisting:** Maintain a continuously updated blacklist of known malicious domains, phishing sites, and spam URLs. New `long_url` submissions are checked against this list. * **Real-time Scanning:** Integrate with third-party safe browsing APIs (e.g., Google Safe Browsing API, VirusTotal) during the link creation process to scan the `long_url` for known threats. * **Heuristics:** Implement algorithms to detect suspicious URL patterns, excessive redirects, or keywords commonly associated with abuse. * **Spam and Abuse Prevention:** * **Rate Limiting:** Apply strict rate limits on the `POST /shorten` API per IP address and/or authenticated user to prevent automated spamming. * **CAPTCHA/reCAPTCHA:** For anonymous link creation, implement CAPTCHA challenges to deter bots. * **User Accounts:** Require user authentication for custom aliases, higher creation limits, and access to analytics. This provides accountability. * **Reporting Mechanism:** Provide a clear way for users to report abusive short links. Reported links are reviewed and disabled if found malicious. * **Link Disabling:** Allow users to manually disable their own links. The system can also automatically disable links flagged by abuse detection or reported by others. * **Security Considerations:** * **HTTPS Everywhere:** Enforce HTTPS for all API endpoints and redirects to ensure data encryption in transit. * **Input Validation and Sanitization:** Rigorously validate and sanitize all user-provided inputs (`long_url`, `custom_alias`) to prevent common web vulnerabilities like XSS, SQL injection, and path traversal. * **Access Control:** Implement role-based access control (RBAC) for internal management tools and user-specific link management features. * **Data Anonymization:** Anonymize or hash IP addresses and other personally identifiable information (PII) in click analytics data to comply with privacy regulations (e.g., GDPR, CCPA). * **Regular Security Audits:** Conduct periodic security audits, penetration testing, and vulnerability scanning to identify and remediate potential weaknesses. * **DDoS Protection:** Utilize cloud provider DDoS mitigation services (e.g., AWS Shield, Cloudflare) at the edge. 8. The main trade-offs you made and why: * **Consistency vs. Availability/Latency for Redirects:** * **Trade-off:** Prioritized extreme availability and low latency for redirects over strong consistency for link metadata. While link creation requires strong consistency for alias uniqueness, a newly created or updated link might take a few milliseconds to propagate to all caches and database replicas globally before being consistently available for redirects. * **Why:** Redirects are the most critical and high-volume operation. A slight delay in a new link becoming globally redirectable is acceptable, whereas any significant latency or downtime for redirects would severely impact user experience and service reliability targets. * **Cost vs. Performance/Scalability:** * **Trade-off:** Opted for a multi-region, globally replicated architecture with extensive caching and specialized databases, which inherently incurs higher infrastructure costs compared to a single-region, simpler setup. * **Why:** The scale assumptions (1.5B redirects/day, global distribution) and performance targets (p95 < 80ms) necessitate this level of distributed infrastructure. Commodity cloud services and open-source components (like Kafka, Redis) were chosen where possible to optimize cost while meeting performance and scalability requirements. * **Data Granularity vs. Storage Cost/Query Performance for Analytics:** * **Trade-off:** Retain raw click events for a shorter period (90 days) in a cost-effective data lake and store aggregated data for a longer period (2 years) in a more performant, but potentially more expensive, analytics database. * **Why:** Storing 1.5 billion raw events daily for 2 years would be prohibitively expensive and slow for real-time querying. This approach balances the need for detailed historical analysis (via raw data in S3) with the requirement for fast, aggregated insights (via the analytics DB) within budget constraints. * **Short Code Length vs. Collision Probability/Guessability:** * **Trade-off:** Chose Base62 encoding for compact short codes (e.g., 6-10 characters). While a purely random 6-character code has a theoretical collision risk, the ID-based generation strategy eliminates this for system-generated codes. For custom aliases, collision detection is explicit. * **Why:** Compactness is a core feature of a URL shortener. The chosen strategy provides a vast enough namespace to avoid practical collisions for system-generated codes and handles custom alias collisions gracefully, without requiring excessively long or complex short codes that would defeat the purpose of shortening. * **Complexity vs. Feature Set:** * **Trade-off:** Focused on a robust core system for shortening, redirecting, and basic analytics, accepting that more advanced features (e.g., A/B testing, detailed user management, complex reporting) would be iterative additions. * **Why:** To meet the aggressive performance and availability targets for the core functionality within a reasonable design scope. Adding too many features initially would increase complexity, potential failure points, and development time, potentially compromising the core service's stability.

Result

#2

Winning Votes

0 / 3

Average Score

74
Judge Models OpenAI GPT-5.4

Total Score

75

Overall Comments

Answer A gives a coherent end-to-end architecture and covers all requested areas: metadata storage, cache, analytics pipeline, code generation, APIs, abuse controls, and trade-offs. Its main strengths are broad completeness and a sensible separation of redirect, creation, and analytics paths. However, it stays fairly generic in several places, provides limited quantitative sizing, is somewhat loose on multi-region consistency details, and does not dig deeply into tricky issues like hot-key mitigation, degraded modes, cache invalidation, or explicit cost/capacity reasoning.

View Score Details

Architecture Quality

Weight 30%
72

Solid high-level architecture with appropriate major components and sensible separation of redirect, creation, cache, metadata, and analytics. The design is coherent, but some decisions remain generic or optional, such as CDN usage and multi-region write strategy, and it lacks the same level of concrete operational detail as a top-tier answer.

Completeness

Weight 20%
83

Covers nearly all requested topics: architecture, storage, code generation, scaling, reliability, APIs, data models, abuse mitigation, and trade-offs. Minor gaps include less explicit cache invalidation/update behavior for disable/expire actions and less detailed treatment of analytics query dimensions and retention mechanics.

Trade-off Reasoning

Weight 20%
70

Provides reasonable trade-offs around consistency, cost, analytics retention, and code length, but the discussion is somewhat broad and high level. It does not explore nuanced product/technical trade-offs like redirect status code choice, cacheability versus analytics fidelity, or vendor/operational alternatives in much depth.

Scalability & Reliability

Weight 20%
74

Demonstrates a good grasp of read-heavy scaling with cache plus NoSQL DB and asynchronous analytics. Reliability coverage is decent, but some critical aspects are underspecified: explicit consistency levels, realistic hot-key handling beyond generic sharding, cache-failure load absorption, and quantified multi-region failover behavior.

Clarity

Weight 10%
80

Well organized and easy to follow, using numbered sections aligned to the prompt. Some sections are verbose and generic, and a few implementation details are described in broad terms rather than crisp design decisions.

Total Score

85

Overall Comments

Answer A provides a very strong and comprehensive design that correctly addresses all the requirements of the prompt. It proposes a standard, robust architecture with a clear separation of concerns for writes, reads, and analytics. The technology choices are appropriate, and the reasoning for them is sound. The answer is well-structured and easy to follow. Its primary weakness is a relative lack of depth and specificity when compared to Answer B, particularly in its strategies for handling hot keys and in the nuance of its trade-off analysis.

View Score Details

Architecture Quality

Weight 30%
85

The architecture is well-designed, with a clear separation of concerns (Shorten, Redirect, Analytics services) and appropriate component choices. The data flows are logical and cover all major use cases. It represents a solid, industry-standard approach.

Completeness

Weight 20%
90

The answer is very complete, addressing all eight sections requested in the prompt with sufficient detail. The APIs and data models are well-defined and cover the core requirements.

Trade-off Reasoning

Weight 20%
80

The answer discusses several important trade-offs, such as consistency vs. availability and cost vs. performance. The reasoning is logical and clearly connected to the design choices made.

Scalability & Reliability

Weight 20%
80

The plan for scaling and reliability is strong, covering multi-region deployment, caching, and standard failure recovery mechanisms. However, the strategy for handling hot keys is somewhat basic, mentioning sharding and rate limiting but lacking more advanced techniques.

Clarity

Weight 10%
90

The answer is very clear and well-structured. The use of numbered sections and bullet points makes the information easy to digest and follow.

Total Score

62

Overall Comments

Answer A provides a solid, well-structured system design that covers all eight required sections. It correctly identifies the major components (API Gateway, Shorten Service, Redirect Service, Redis, Cassandra/DynamoDB, Kafka, ClickHouse, S3) and describes reasonable data flows. The storage choices are appropriate with adequate rationale. The short-code generation strategy using Snowflake IDs with Base62 encoding is sound. The reliability plan covers key failure scenarios and degraded modes. APIs and data models are well-defined. Abuse mitigation is comprehensive. Trade-offs are discussed at a reasonable level. However, the answer remains somewhat generic in places — it lacks specific quantitative analysis (e.g., traffic math, capacity estimates, cost projections), doesn't discuss the 301 vs 302 trade-off for redirects (critical for analytics), doesn't address hot key mitigation beyond basic cache sharding, and doesn't provide concrete sizing for infrastructure components. The multi-region strategy mentions active-active but doesn't detail consistency levels or replication factors. Overall, it's a competent answer but lacks the depth and specificity that would distinguish it as exceptional.

View Score Details

Architecture Quality

Weight 30%
65

Answer A presents a clean architecture with appropriate component separation (write, read, analytics paths). The data flow is clearly described. However, it lacks specificity in areas like the CDN layer strategy, doesn't discuss 301 vs 302 redirect implications, and the multi-region strategy is somewhat vague without concrete consistency level specifications.

Completeness

Weight 20%
65

Answer A covers all eight required sections adequately. APIs, data models, abuse mitigation, and trade-offs are all present. However, it lacks quantitative capacity planning, cost estimates, specific infrastructure sizing, the 301/302 trade-off, GDPR considerations in detail, open redirect prevention, and concrete recovery time objectives. The analytics pipeline description is somewhat generic.

Trade-off Reasoning

Weight 20%
55

Answer A discusses five trade-offs that are reasonable but somewhat generic. The consistency vs availability trade-off is standard. The cost vs performance discussion lacks specific numbers. The short code length discussion is adequate. The trade-offs don't deeply engage with the specific constraints of the problem (e.g., no discussion of 301 vs 302, no discussion of Cassandra vs relational DB specifics, no analytics pipeline synchronous vs async trade-off).

Scalability & Reliability

Weight 20%
60

Answer A covers multi-region deployment, caching, sharding, and failure scenarios at a reasonable level. The hot key mitigation is limited to cache sharding and rate limiting. Disaster recovery mentions backups and multi-region but lacks specific RTO/RPO targets. Degraded mode behavior is described but without concrete fallback strategies. No specific capacity numbers or traffic math are provided.

Clarity

Weight 10%
65

Answer A is well-organized with clear section headers and consistent formatting. The writing is clear and easy to follow. Data models use a readable format. However, the lack of diagrams and quantitative details makes some sections feel abstract. The bullet-point style is consistent but sometimes leads to surface-level descriptions.

DESIGN: GLOBAL URL SHORTENING SERVICE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1. HIGH-LEVEL ARCHITECTURE AND DATA FLOW ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ COMPONENTS [Client] │ ▼ [Global Anycast DNS / GeoDNS] │ ▼ [Regional Edge PoPs — CDN / Nginx reverse proxy] │ (cache hit → immediate 301/302 response) │ (cache miss → forward to origin) ▼ [Regional API Gateway (rate-limiting, auth, TLS termination)] │ ├──► [Redirect Service] ──► [Redis Cluster (hot-key cache)]...

Show Full Answer

DESIGN: GLOBAL URL SHORTENING SERVICE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1. HIGH-LEVEL ARCHITECTURE AND DATA FLOW ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ COMPONENTS [Client] │ ▼ [Global Anycast DNS / GeoDNS] │ ▼ [Regional Edge PoPs — CDN / Nginx reverse proxy] │ (cache hit → immediate 301/302 response) │ (cache miss → forward to origin) ▼ [Regional API Gateway (rate-limiting, auth, TLS termination)] │ ├──► [Redirect Service] ──► [Redis Cluster (hot-key cache)] │ │ miss │ ▼ │ [Link Metadata Store (Cassandra)] │ └──► [Creation Service] ──► [ID Generator (Snowflake-style)] │ │ ├──► [Link Metadata Store (Cassandra)] └──► [Custom Alias Checker (Redis SET + Cassandra)] [Redirect Service] ──► [Click Event Publisher (Kafka)] │ ▼ [Stream Processor (Flink / Spark Streaming)] │ ┌───────────┴────────────┐ ▼ ▼ [Raw Event Store [Aggregated Analytics (S3 + Parquet, 90d)] Store (ClickHouse, 2y)] DATA FLOW — REDIRECT (CRITICAL PATH) 1. User visits https://sho.rt/aB3xYz 2. GeoDNS routes to nearest regional PoP. 3. Edge CDN checks its local cache for key "aB3xYz". - HIT: returns HTTP 301 (cacheable) or 302 (non-cacheable for analytics) immediately. Fires async beacon to analytics pipeline. - MISS: forwards to regional Redirect Service. 4. Redirect Service checks regional Redis cluster. - HIT: returns destination URL, fires Kafka event, responds 302. - MISS: reads from Cassandra (nearest replica), populates Redis, responds 302. 5. Kafka consumer (Flink) processes click event within seconds; aggregates land in ClickHouse within ~5 minutes; worst-case 15 minutes. DATA FLOW — CREATION 1. Client POSTs to Creation Service via API Gateway. 2. Rate-limit and auth checks applied. 3. If custom alias: check Redis SET and Cassandra for collision; reject if taken. 4. If auto-generated: Snowflake ID → Base62 encode → 7-character code. 5. Write to Cassandra (quorum write, RF=3). 6. Optionally warm Redis cache immediately. 7. Return short URL to client. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2. STORAGE CHOICES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ LINK METADATA STORE — Apache Cassandra Rationale: - 120M new links/month ≈ 46 links/sec average, with spikes 10–20×. Cassandra handles high write throughput with tunable consistency. - Primary access pattern is single-key lookup by short_code — Cassandra's partition-key model is ideal. - Multi-datacenter replication (RF=3 per region, 3 regions) gives 99.99%+ availability without a single point of failure. - Scales horizontally; no expensive cross-shard transactions needed. - TTL support is native — expired links are automatically tombstoned. - Cost: commodity nodes, no licensing fees. Schema (simplified): Table: links short_code TEXT PRIMARY KEY long_url TEXT owner_id UUID created_at TIMESTAMP expires_at TIMESTAMP (nullable; Cassandra TTL set on row) is_disabled BOOLEAN is_custom BOOLEAN Alternative considered: DynamoDB — good but vendor lock-in and cost at this scale; PostgreSQL — not horizontally scalable for this write volume without significant sharding complexity. REDIRECT PATH CACHE — Redis Cluster (per region) - Stores short_code → {long_url, is_disabled, expires_at} as a hash. - TTL on cache entry = min(link expiry, 24 hours) to avoid stale entries. - Cluster mode with 6 nodes (3 primary + 3 replica) per region; ~50 GB RAM per region covers the hot working set (top ~10M links). - Eviction policy: allkeys-lru. - Cost justified: Redis hit rate expected >95%; each cache miss costs a Cassandra read (~5–10 ms) plus latency; at 17,000 redirects/sec per region, avoiding Cassandra reads is critical for p95 target. ANALYTICS — RAW EVENTS: Apache Kafka + S3 (Parquet) - Kafka (3-broker cluster per region, topic: click_events, 64 partitions) buffers click events durably. - Flink consumers read from Kafka, enrich events (geo-IP, user-agent parsing), and write Parquet files to S3 every 5 minutes. - S3 lifecycle policy deletes raw files after 90 days. - Cost: S3 is cheap (~$0.023/GB/month); 1.5B events/day × ~200 bytes ≈ 300 GB/day → ~27 TB/90 days → ~$620/month storage. ANALYTICS — AGGREGATED: ClickHouse - Flink also writes pre-aggregated rows (per short_code, per hour, per country/device) to ClickHouse. - ClickHouse's columnar storage and vectorized execution make time-series aggregation queries fast. - Replication factor 2 per region; 2-year retention. - Estimated size: ~1 TB/year aggregated → very manageable. - Alternative considered: Apache Druid — also excellent but operationally heavier; ClickHouse is simpler for this use case. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3. SHORT-CODE GENERATION STRATEGY ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ AUTO-GENERATED CODES Strategy: Snowflake-style distributed ID → Base62 encoding 1. Each Creation Service instance holds a unique worker ID (assigned at startup via ZooKeeper or a simple DB table). 2. Generate a 64-bit Snowflake ID: [41-bit timestamp ms | 10-bit worker ID | 12-bit sequence]. 3. Base62-encode the ID (characters: 0-9, a-z, A-Z). 4. A 64-bit integer encodes to at most 11 Base62 characters; in practice, for IDs generated in the first ~10 years, 7–8 characters suffice. 5. 7 characters in Base62 = 62^7 ≈ 3.5 trillion combinations — far exceeds 120M/month × 12 months × 10 years = ~14.4 billion links. Collision avoidance: - Snowflake IDs are globally unique by construction (worker ID + timestamp + sequence). No collision is possible unless two workers share the same worker ID, which is prevented by the coordination service. - No need for a "check-then-insert" loop for auto-generated codes. Guessability: - Sequential Snowflake IDs encoded in Base62 are not random but are not trivially enumerable either (timestamp component changes every millisecond, sequence resets). For additional obscurity, XOR the lower 32 bits with a per-deployment secret before encoding. This is not cryptographic security but raises the bar for enumeration attacks. - If stronger unpredictability is needed: generate 6 random bytes → Base62 → 8-character code; check Cassandra for collision (rare at this scale); retry on collision. Expected collision rate at 1.4B existing links out of 62^8 ≈ 218T space is negligible (<0.001%). CUSTOM ALIASES 1. User provides desired alias (e.g., "my-promo-2025"). 2. Validate: 3–50 characters, alphanumeric + hyphens, no reserved words (api, admin, health, etc.). 3. Check Redis SET "custom_aliases" for existence (O(1) bloom-filter-like check, or a Redis SET). 4. Attempt Cassandra INSERT IF NOT EXISTS (lightweight transaction / compare-and-set). 5. If already taken, return HTTP 409 Conflict. 6. Custom aliases are stored with is_custom=true; they are never overwritten by the auto-generation path. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4. SCALING PLAN FOR GLOBAL TRAFFIC ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TRAFFIC MATH - 1.5B redirects/day = ~17,400 req/sec average; assume 3× peak = ~52,000 req/sec. - 120M creations/month = ~46/sec average; 10× peak = ~460/sec. Creation is not the bottleneck. CACHING LAYERS Layer 1 — CDN/Edge (Cloudflare, Fastly, or self-hosted Nginx at PoPs): - Cache 301 responses for non-expiring, non-analytics-critical links. Cache-Control: max-age=3600. - Use 302 for links where per-click analytics are required (most links); these bypass CDN cache but still benefit from geographic proximity. - Estimated CDN hit rate for popular links: 40–60% of traffic served at edge without hitting origin. Layer 2 — Regional Redis Cluster: - Covers the remaining ~40–60% of requests that reach origin. - Expected Redis hit rate: >95% of origin-reaching requests. - Net Cassandra reads: <5% of 17,400 req/sec ≈ ~870 req/sec — well within Cassandra's capacity. PARTITIONING / SHARDING Cassandra: - Partition key = short_code. Cassandra's consistent hashing distributes partitions evenly across nodes. - No manual sharding needed; add nodes to rebalance automatically. - Avoid hotspots: short_code is high-cardinality; no single partition will be disproportionately large. Redis: - Redis Cluster uses hash slots (16,384 slots) distributed across nodes. short_code hashes map evenly. - Hot keys (viral links) handled separately — see Reliability section. Kafka: - 64 partitions per topic; partition key = short_code. Ensures ordered processing per link while parallelizing across consumers. MULTI-REGION DEPLOYMENT Regions: US-East, EU-West, AP-Southeast (minimum 3 for 99.99% availability). Cassandra: - Multi-datacenter replication: RF=3 per datacenter, 3 datacenters. - Writes: LOCAL_QUORUM (2 of 3 nodes in local DC) for creation — fast and consistent within region. - Reads: LOCAL_QUORUM for redirect — reads from nearest DC. - Cross-DC replication is asynchronous; eventual consistency across regions is acceptable for redirects (a newly created link may take <1 second to propagate globally — acceptable). Redis: - Independent cluster per region; no cross-region replication (cost and complexity not justified). - Cache miss falls back to local Cassandra replica. DNS: - GeoDNS routes users to nearest regional API Gateway. - Anycast IP for the redirect domain ensures lowest-latency routing. Creation Service: - Stateless; deploy in each region. Worker IDs are globally unique (coordinated via a lightweight global ZooKeeper or a simple DB table with region prefix). ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5. RELIABILITY PLAN ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ FAILURE SCENARIOS AND MITIGATIONS Redis node failure: - Redis Cluster promotes replica automatically (typically <30 seconds). - During failover, requests fall through to Cassandra. Cassandra can handle the burst (~870 req/sec normally; burst to ~17,000 req/sec for <30 seconds is manageable with proper provisioning — size Cassandra for 2× expected peak read load). - Circuit breaker in Redirect Service: if Redis is fully unavailable, bypass cache and read directly from Cassandra. Cassandra node failure: - RF=3 with LOCAL_QUORUM means 1 node failure is transparent; 2 node failures degrade to LOCAL_ONE (still functional, slightly less consistent). - Cassandra's gossip protocol detects failures within seconds; hinted handoff ensures writes are not lost. Entire region failure: - GeoDNS health checks detect region unavailability within 30–60 seconds and reroute traffic to next-nearest region. - Cassandra cross-DC replication ensures data is available in surviving regions. - Target: RTO < 2 minutes, RPO = 0 for link metadata (synchronous within-DC quorum writes; async cross-DC replication means <1 second of potential data loss for very recent creations — acceptable). HOT KEYS (VIRAL LINKS) Problem: A single viral short code could generate millions of requests/second, overwhelming a single Redis slot or Cassandra partition. Mitigations: 1. Local in-process cache in each Redirect Service instance (e.g., Caffeine, 10,000-entry LRU, 30-second TTL). Absorbs repeated requests within a single pod without hitting Redis. 2. Redis key replication: for detected hot keys (monitored via Redis MONITOR or a sliding-window counter), replicate the key to multiple Redis slots with a suffix (e.g., aB3xYz:0, aB3xYz:1 ... aB3xYz:N) and randomly select one on read. 3. CDN caching: for non-analytics links, push hot links to CDN with a short TTL (60 seconds) to absorb traffic at the edge. 4. Cassandra: hot keys are not a problem because Redis absorbs >99% of reads for hot links. DISASTER RECOVERY - Cassandra: daily snapshots to S3 (nodetool snapshot). Cross-region replication provides live DR. - Redis: Redis persistence (AOF + RDB) to local disk; Redis is a cache — loss is acceptable (falls back to Cassandra). No cross-region Redis replication needed. - Kafka: replication factor 3 within region; cross-region Kafka MirrorMaker 2 replicates click events to a secondary region for analytics DR. - ClickHouse: replicated tables (ReplicatedMergeTree); daily backups to S3. DEGRADED-MODE BEHAVIOR - If analytics pipeline (Kafka/Flink) is down: redirects continue unaffected. Click events are buffered in Kafka (7-day retention); analytics catch up when pipeline recovers. SLA for analytics (15-minute freshness) may be temporarily violated — acceptable. - If Creation Service is down: existing redirects continue working (read path is independent). New link creation fails gracefully with 503. - If Cassandra is degraded (quorum unavailable): Redirect Service falls back to Redis-only mode. Links not in Redis return 503 rather than an incorrect redirect. This is a last-resort degraded mode. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6. KEY APIs AND CORE DATA MODELS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ REST API POST /api/v1/links Request: { "long_url": "https://example.com/very/long/path?q=1", "custom_alias": "my-promo", // optional "expires_at": "2026-01-01T00:00:00Z", // optional ISO-8601 "owner_id": "uuid" // from auth token } Response 201: { "short_code": "aB3xYz", "short_url": "https://sho.rt/aB3xYz", "long_url": "https://example.com/very/long/path?q=1", "expires_at": "2026-01-01T00:00:00Z", "created_at": "2025-01-15T10:00:00Z" } Errors: 400 (invalid URL), 409 (alias taken), 422 (URL blocked/malicious), 429 (rate limited) GET /{short_code} Response 302: Location: <long_url> Response 301: Location: <long_url> (for cacheable, non-analytics links) Response 404: link not found Response 410: link expired or disabled PATCH /api/v1/links/{short_code} Request: { "is_disabled": true } // or update expires_at Response 200: updated link object Auth: owner only DELETE /api/v1/links/{short_code} Response 204 Auth: owner only GET /api/v1/links/{short_code}/analytics Query params: from, to, granularity (hour/day), group_by (country/device/referrer) Response 200: { "short_code": "aB3xYz", "total_clicks": 142300, "series": [ { "timestamp": "2025-01-15T10:00:00Z", "clicks": 1200, "country": "US" }, ... ] } CORE DATA MODELS links (Cassandra) short_code TEXT PRIMARY KEY long_url TEXT NOT NULL owner_id UUID created_at TIMESTAMP expires_at TIMESTAMP (nullable; also set as Cassandra row TTL) is_disabled BOOLEAN DEFAULT false is_custom BOOLEAN DEFAULT false abuse_flags INT DEFAULT 0 click_events (Kafka → S3 Parquet, raw) event_id UUID short_code TEXT timestamp TIMESTAMP ip_hash TEXT (SHA-256 of IP, for privacy) country_code TEXT (from geo-IP lookup) user_agent TEXT referrer TEXT region TEXT (serving region) analytics_hourly (ClickHouse, aggregated) short_code TEXT hour DateTime country_code TEXT device_type TEXT (mobile/desktop/bot) referrer_domain TEXT click_count UInt64 unique_ips UInt64 (HyperLogLog approximation) PRIMARY KEY (short_code, hour, country_code, device_type) ENGINE = ReplicatedMergeTree users (PostgreSQL — small table, not on hot path) user_id UUID PRIMARY KEY email TEXT UNIQUE api_key_hash TEXT rate_limit_tier TEXT created_at TIMESTAMP ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7. ABUSE MITIGATION AND SECURITY ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ MALICIOUS URL DETECTION 1. Safe Browsing API check at creation time: submit long_url to Google Safe Browsing (or equivalent) before storing. Reject with 422 if flagged. Latency: ~50–100 ms, acceptable within 300 ms creation SLA. 2. Domain blocklist: maintain a Redis SET of known-bad domains (phishing, malware). Check synchronously at creation. Updated hourly from threat intelligence feeds. 3. Async re-scanning: a background job periodically re-scans stored URLs against updated blocklists. If a previously clean URL is later flagged, set is_disabled=true and abuse_flags=1. Redirect Service checks is_disabled on every request (from cache). 4. URL normalization: parse and normalize URLs at creation to prevent obfuscation (e.g., URL encoding tricks, homograph attacks on domain names — check for IDN homographs). RATE LIMITING - Per-IP rate limit at API Gateway: 10 creations/minute for anonymous, 100/minute for authenticated users, configurable per tier. - Redis-based sliding window counter (token bucket) per IP/user. - Global rate limit: 10,000 creations/second system-wide (well above expected peak of 460/sec). - Redirect path: no rate limit (public); DDoS protection via CDN (Cloudflare DDoS mitigation or equivalent). ENUMERATION PREVENTION - Short codes are not purely sequential (Snowflake XOR obfuscation or random generation). - No API endpoint to list all short codes. - Unauthenticated bulk redirect attempts (>1,000 unique codes/minute from one IP) trigger a CAPTCHA challenge or temporary IP block. AUTHENTICATION AND AUTHORIZATION - API key or OAuth 2.0 JWT for creation and management endpoints. - PATCH/DELETE require ownership verification (owner_id matches token). - Admin API (separate internal endpoint) for abuse team to disable links. PRIVACY - IP addresses are hashed (SHA-256 + per-deployment salt) before storage in analytics. Raw IPs are never persisted. - GDPR: users can request deletion of their links and associated analytics. Cassandra DELETE + S3 lifecycle tag for raw events. HTTPS ENFORCEMENT - All short URLs served over HTTPS only. HTTP requests redirected to HTTPS at the edge. - HSTS header on the short domain. OPEN REDIRECT PREVENTION - Only redirect to URLs that were explicitly registered. No dynamic URL construction from user input at redirect time. - Validate long_url scheme at creation: only http:// and https:// allowed. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8. KEY TRADE-OFFS AND JUSTIFICATIONS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TRADE-OFF 1: 302 vs 301 for redirects - 301 (permanent) is cached by browsers indefinitely → lowest latency, zero server load for repeat visits. But analytics are lost (browser never calls server again), and disabling/expiring a link is impossible for cached clients. - 302 (temporary) hits the server every time → full analytics, instant disable/expiry. Slightly higher latency for repeat visits. - Decision: Use 302 by default for all links. Offer 301 as an opt-in for users who explicitly want maximum caching and do not need analytics. This is the right default for a service that sells analytics. TRADE-OFF 2: Cassandra vs. relational DB for link metadata - Relational DB (PostgreSQL) would be simpler to operate and supports richer queries. But horizontal scaling for 46 writes/sec average (with 10–20× spikes) and 17,000 reads/sec is painful without significant sharding complexity. - Cassandra sacrifices rich query support (no joins, limited secondary indexes) for linear horizontal scalability and multi-DC replication. Since the only hot query is single-key lookup by short_code, this trade-off is entirely acceptable. TRADE-OFF 3: Eventual consistency for cross-region link propagation - A link created in US-East may not be immediately visible in AP-Southeast (sub-second lag). A user who creates a link and immediately shares it globally could see a brief 404 in distant regions. - Alternative: synchronous cross-region writes (EACH_QUORUM) — guarantees global consistency but adds 100–200 ms latency to creation (violates 300 ms SLA under adverse conditions) and reduces availability. - Decision: Accept eventual consistency. The sub-second propagation window is an acceptable UX trade-off. Document it. Creation SLA is met. TRADE-OFF 4: Snowflake IDs vs. random codes - Snowflake IDs are monotonically increasing → slightly predictable but no collision risk, no DB check needed, operationally simple. - Random codes → better unpredictability, but require a collision check (rare but adds latency and complexity). - Decision: Snowflake + XOR obfuscation for auto-generated codes. Offers a good balance of simplicity, performance, and reasonable unpredictability. Random codes available as an option for security-sensitive use cases. TRADE-OFF 5: Separate analytics pipeline vs. synchronous writes - Writing analytics synchronously to a DB on every redirect would add latency to the critical path and create a write bottleneck at 17,000 req/sec. - Async Kafka pipeline decouples analytics from the redirect path entirely. The cost is eventual consistency in analytics (up to 15 minutes) and added operational complexity (Kafka + Flink + ClickHouse). - Decision: Async pipeline. The 15-minute analytics freshness requirement explicitly allows this. The redirect path must be fast; analytics can lag. TRADE-OFF 6: CDN caching scope - Caching all redirects at CDN (301) would give maximum performance but break analytics and disable/expiry. - Caching only non-analytics, non-expiring links at CDN is a middle ground. - Decision: CDN caches only links explicitly marked as cacheable (301 opt-in). All others use 302 with Cache-Control: no-store. This is the correct default for a service where analytics are a core feature. TRADE-OFF 7: ClickHouse vs. Druid vs. BigQuery for analytics - BigQuery: fully managed, no ops, but high query latency (seconds) and cost at this event volume. - Druid: excellent for real-time analytics but operationally complex (ZooKeeper, Broker, Historical, MiddleManager nodes). - ClickHouse: simpler to operate than Druid, excellent query performance, cost-effective on commodity hardware, good Kafka integration. - Decision: ClickHouse. Best balance of operational simplicity, performance, and cost for this use case. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ CAPACITY SUMMARY (3-REGION DEPLOYMENT) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Redirect Service: ~30 pods × 3 regions (auto-scaled, stateless) Creation Service: ~5 pods × 3 regions Cassandra: 9 nodes × 3 regions (RF=3 per DC), ~2 TB SSD each Redis Cluster: 6 nodes × 3 regions, 50 GB RAM each Kafka: 3 brokers × 3 regions Flink: 10 task managers × 3 regions ClickHouse: 4 nodes × 3 regions (replicated) S3: ~27 TB raw events/90 days + ~3 TB aggregated/2 years Estimated monthly infrastructure cost: $15,000–$25,000 (commodity cloud VMs), dominated by Cassandra storage and Redis memory. CDN egress is the largest variable cost at scale.

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

91
Judge Models OpenAI GPT-5.4

Total Score

90

Overall Comments

Answer B presents a stronger, more production-oriented design with clear component separation, concrete data flow, storage rationale, consistency choices, capacity math, and detailed reliability handling. It directly addresses the scale and latency goals with layered caching, regional deployment, explicit Cassandra consistency levels, Kafka/Flink/ClickHouse analytics, and richer failure-mode analysis. It also includes practical API definitions, data models, abuse defenses, and explicit trade-offs on 301 vs 302, eventual consistency, and cost/operational complexity.

View Score Details

Architecture Quality

Weight 30%
89

Very strong architecture with clear read/write/analytics separation, explicit regional edge layer, cache hierarchy, metadata store, streaming pipeline, and concrete request flows. The design ties component choices directly to latency and freshness requirements and reflects a more production-ready end-to-end system.

Completeness

Weight 20%
92

Covers all requested areas thoroughly, including architecture, storage, code generation, scaling, reliability, APIs, models, abuse mitigation, and trade-offs. It also adds useful details such as traffic math, retention sizing, consistency levels, and degraded modes that strengthen completeness.

Trade-off Reasoning

Weight 20%
91

Excellent trade-off discussion with concrete decisions and alternatives: 301 vs 302, Cassandra vs relational DB, eventual consistency vs synchronous global writes, Snowflake vs random codes, async analytics vs synchronous writes, CDN cache scope, and analytics-store selection. The reasoning is explicit and well tied to stated requirements.

Scalability & Reliability

Weight 20%
90

Strong scalability and reliability planning with traffic math, layered caches, partitioning strategy, regional deployment, Cassandra replication/consistency choices, Kafka partitioning, hot-key mitigation, failover scenarios, DR measures, and degraded modes. The answer is notably stronger in operational realism and failure analysis.

Clarity

Weight 10%
86

Very clear and structured, with strong headings, separated data flows, storage rationale, failure scenarios, and trade-offs. It is dense, but the organization makes the design easy to evaluate and the decisions easy to trace.

Total Score

98

Overall Comments

Answer B is an exceptional, principal-level system design. It not only covers all requirements but does so with outstanding depth, specificity, and clarity. The design is grounded in quantitative analysis (e.g., "Traffic Math"), and the technology choices are justified with detailed comparisons to alternatives. Its reliability plan, especially the multi-layered strategy for handling hot keys, is particularly advanced. The trade-off analysis is nuanced and covers critical, practical dilemmas. The formatting and inclusion of extra details like a capacity summary and cost estimate make it a truly top-tier response.

View Score Details

Architecture Quality

Weight 30%
95

The architecture is outstanding. It presents a highly detailed, multi-layered system (CDN, regional PoPs, services) and specifies concrete technologies with strong justifications. The text-based flow diagrams are extremely effective at communicating the design, particularly the critical redirect path.

Completeness

Weight 20%
100

The answer is exceptionally complete. It addresses all prompt requirements in great detail and goes above and beyond by including traffic calculations, a detailed capacity summary, and an estimated infrastructure cost, which adds significant practical value to the design.

Trade-off Reasoning

Weight 20%
98

The trade-off analysis is exemplary. It identifies seven distinct, highly relevant trade-offs and discusses them with deep insight, such as the practical implications of 301 vs. 302 redirects for analytics, and compares specific technology alternatives (e.g., ClickHouse vs. Druid vs. BigQuery).

Scalability & Reliability

Weight 20%
98

This is a standout section. The scalability plan is grounded in traffic math, and the reliability plan is extremely robust. The multi-layered strategy for mitigating hot keys (in-process cache, Redis key replication, CDN) is particularly sophisticated and demonstrates a deep understanding of real-world failure modes at scale.

Clarity

Weight 10%
100

The clarity is perfect. The use of structured formatting, headers, and text-based flow diagrams makes a highly complex design exceptionally easy to understand. The logical flow from high-level architecture to low-level data models is flawless.

Total Score

85

Overall Comments

Answer B delivers an exceptionally detailed and well-structured system design that goes significantly beyond the baseline. It includes ASCII architecture diagrams, concrete traffic math (17,400 req/sec, peak calculations), specific infrastructure sizing (node counts, RAM, storage), and estimated monthly costs ($15,000-$25,000). The storage section provides detailed rationale including cost calculations (e.g., $620/month for S3 raw events). The short-code generation strategy includes a thoughtful XOR obfuscation technique for anti-enumeration. The scaling plan provides specific numbers for CDN hit rates, Redis hit rates, and resulting Cassandra load. The reliability plan addresses specific failure scenarios with concrete recovery times (RTO < 2 minutes) and includes sophisticated hot-key mitigations (local in-process cache, Redis key replication with suffixes). The 301 vs 302 trade-off discussion is particularly insightful and directly relevant to the analytics requirement. The API design includes proper HTTP status codes for edge cases (410 for expired/disabled links, 422 for malicious URLs). Seven distinct trade-offs are discussed with clear reasoning. The capacity summary provides a concrete deployment blueprint. Minor weaknesses include the answer being very long and some areas where operational complexity could be discussed more (e.g., managing ZooKeeper for Snowflake worker IDs across regions).

View Score Details

Architecture Quality

Weight 30%
85

Answer B provides a more detailed and coherent architecture with ASCII diagrams, explicit CDN/edge layer with the critical 301 vs 302 distinction, specific consistency levels (LOCAL_QUORUM), and a clear separation of concerns. The data flow descriptions include specific cache hit/miss paths and the analytics pipeline is well-integrated. The architecture directly addresses the stated traffic and latency goals with quantitative backing.

Completeness

Weight 20%
85

Answer B covers all eight sections with significantly more depth. It includes traffic math, capacity summary with specific node counts and RAM, cost estimates ($15K-$25K/month), S3 storage cost calculations, RTO/RPO targets, GDPR deletion procedures, open redirect prevention, URL normalization, HSTS headers, enumeration prevention, and a users table for auth. The 301 vs 302 trade-off and CDN caching scope are addressed as distinct considerations. A capacity summary section provides additional completeness.

Trade-off Reasoning

Weight 20%
85

Answer B discusses seven well-articulated trade-offs with specific reasoning tied to the problem constraints. The 301 vs 302 trade-off is particularly insightful and directly relevant to the analytics requirement. The Cassandra vs relational DB trade-off includes specific throughput numbers. The eventual consistency trade-off mentions the specific SLA impact. The ClickHouse vs Druid vs BigQuery comparison provides concrete operational reasoning. Each trade-off clearly states the decision and why, with quantitative backing where appropriate.

Scalability & Reliability

Weight 20%
85

Answer B provides detailed traffic math (17,400 req/sec average, 52,000 peak), specific cache hit rate estimates (CDN 40-60%, Redis >95%), resulting Cassandra load calculations (~870 req/sec), and concrete infrastructure sizing. Hot key mitigation includes three sophisticated strategies (local in-process cache, Redis key replication with suffixes, CDN push). Failure scenarios include specific recovery times (RTO < 2 minutes, RPO < 1 second). Redis failover behavior is described with timing (<30 seconds). Cassandra degradation path is clearly specified.

Clarity

Weight 10%
80

Answer B uses ASCII art diagrams, clear section separators, and a structured format that makes it easy to navigate. The inclusion of specific numbers, calculations, and concrete examples enhances clarity. The trade-off sections are particularly well-structured with clear decision statements. The capacity summary provides a quick reference. The answer is longer but the additional length is justified by substantive content rather than verbosity.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Winning Votes

0 / 3

Average Score

74
View this answer

Winning Votes

3 / 3

Average Score

91
View this answer

Judging Results

Why This Side Won

Answer B wins decisively across all criteria. It provides significantly more depth and specificity than Answer A, including concrete traffic calculations, infrastructure sizing, cost estimates, and nuanced trade-off discussions (especially the 301 vs 302 redirect trade-off critical for analytics). Answer B's reliability plan is more detailed with specific failure scenarios and recovery metrics. Its scaling plan includes quantitative analysis of cache hit rates and resulting database load. The hot-key mitigation strategy (local in-process cache, Redis key replication with suffixes) is more sophisticated. The abuse mitigation section includes additional considerations like URL normalization, open redirect prevention, and async re-scanning. The capacity summary provides a concrete deployment blueprint that Answer A lacks entirely. While Answer A is competent, Answer B demonstrates the kind of depth and practical engineering judgment expected in a top-tier system design.

Why This Side Won

Answer B wins decisively due to its superior depth, specificity, and quantitative reasoning across all criteria. While Answer A provides a very good design, Answer B's analysis is at a higher level of engineering maturity. This is most evident in its detailed scalability plan grounded in traffic calculations, its advanced multi-layered strategy for handling hot keys, and its more nuanced and comprehensive discussion of trade-offs. The exceptional clarity and completeness, including bonus sections on capacity and cost, further solidify its position as the better answer.

Judge Models OpenAI GPT-5.4

Why This Side Won

Answer B wins because it scores higher on the most heavily weighted criteria: architecture quality, trade-off reasoning, and scalability/reliability. Both answers are complete and broadly correct, but B is more concrete and realistic about global traffic, cache layering, consistency semantics, hot keys, regional failover, analytics freshness, and cost/capacity implications. That extra rigor makes it the stronger system design response overall.

X f L