Orivel Orivel
Open menu

Latest Tasks & Discussions

Browse the latest benchmark content across tasks and discussions. Switch by genre to focus on what you want to compare.

Benchmark Genres

Model Directory

Analysis

OpenAI GPT-5.5 VS Google Gemini 2.5 Flash

Choosing a Database for a Growing SaaS Startup

You are advising the CTO of a two-year-old B2B SaaS startup that provides project management software to mid-sized companies. The current setup uses a single PostgreSQL instance, and it is now showing strain: read queries on dashboards take 3–8 seconds during peak hours, the database is 800 GB and growing ~40 GB/month, and the team expects user count to triple over the next 12 months. The engineering team has 9 developers, only one of whom has significant database administration experience. Budget is constrained but not severely limited. The CTO is weighing four options: 1. Vertically scale the existing PostgreSQL instance and add read replicas. 2. Migrate to a managed distributed SQL database (e.g., CockroachDB or Spanner-like service). 3. Split the workload: keep PostgreSQL for transactional data, introduce a separate analytical store (e.g., ClickHouse or BigQuery) for dashboards. 4. Migrate to a NoSQL document database (e.g., MongoDB or DynamoDB). Write an analysis (roughly 500–800 words) that: - Evaluates each of the four options against the startup's specific constraints (performance bottleneck location, team expertise, growth trajectory, budget). - Identifies the key trade-offs and risks of each option. - Reaches a clear, justified recommendation (you may recommend one option or a phased combination). - Specifies what evidence or measurements you would want to verify before committing to the recommendation. Be concrete: refer to the numbers given, and avoid generic database advice that ignores the scenario.

274
May 16, 2026 09:38

Coding

OpenAI GPT-5.5 VS Google Gemini 2.5 Flash

Rate Limiter with Sliding Window and Burst Allowance

Design and implement a thread-safe rate limiter in a language of your choice (Python, Go, Java, TypeScript, or Rust) that supports the following requirements: 1. **API surface**: Expose at least these operations: - `allow(client_id: str, cost: int = 1) -> bool` — returns whether the request is permitted right now. - `retry_after(client_id: str) -> float` — returns seconds until at least 1 unit of capacity is available (0 if currently allowed). - A constructor that accepts per-client configuration: `rate` (units per second), `burst` (max units stored), and an optional `window_seconds` for sliding-window accounting. 2. **Algorithm**: Implement a hybrid that combines a **token bucket** (for burst tolerance) with a **sliding-window log or counter** (to bound the total requests permitted within `window_seconds`, preventing sustained abuse that a pure token bucket would allow after refills). A request is permitted only if both checks pass. Justify your data-structure choice for the sliding window (exact log vs. weighted two-bucket approximation) and discuss memory/accuracy tradeoffs in a short comment block or accompanying note. 3. **Concurrency**: The limiter will be hit by many threads/goroutines concurrently for the same and different `client_id`s. Avoid a single global lock becoming a bottleneck (e.g., per-client locks or lock striping). Document why your approach is correct under concurrent `allow` calls (no double-spend of tokens, no lost updates). 4. **Time source**: Make the clock injectable so tests are deterministic. Use a monotonic clock by default. 5. **Edge cases to handle explicitly**: - `cost` larger than `burst` (must reject, never block forever). - Clock going backwards or large pauses (e.g., suspended VM): clamp rather than crash, and don't grant unbounded tokens. - First-ever request for a new client (lazy initialization). - Stale client cleanup (memory must not grow unbounded if clients stop calling). - Fractional tokens / sub-millisecond timing. 6. **Tests**: Provide at least 6 unit tests using the injectable clock that cover: basic allow/deny, burst draining and refill, sliding-window cap independent of bucket refill, `cost > burst`, concurrent contention on one client (deterministic property: total permitted in T seconds ≤ rate*T + burst), and stale-client eviction. 7. **Complexity**: State the amortized time complexity of `allow` and the memory complexity per client. Deliver: complete runnable code (single file is fine, but you may split files if you label them clearly), the tests, and a brief design note (max ~250 words) explaining your choices and the precise semantics when the two algorithms disagree.

254
May 12, 2026 09:45

Planning

OpenAI GPT-5.5 VS Google Gemini 2.5 Pro

72-Hour Product Launch Recovery Plan

You are the interim project lead for a mid-sized SaaS company. Your team was scheduled to launch a major new feature ("Smart Reports") to all paying customers in 72 hours (Friday 5:00 PM, in your timezone). It is now Tuesday 5:00 PM. This morning, the following problems surfaced simultaneously: 1. QA discovered a critical bug: under specific timezone settings, exported PDF reports show incorrect totals (off by up to 8%). Reproduction is reliable; root cause is suspected but not confirmed. 2. The lead backend engineer (the only person who knows the reporting service deeply) is out sick and unreachable until Thursday morning at the earliest. 3. Marketing has already sent a teaser email to 40,000 customers promising Friday availability, and a press embargo lifts Friday at 9:00 AM. 4. Customer Support has flagged that 3 enterprise customers (combined ARR ~$600k) explicitly requested this feature in their renewal conversations and expect it on Friday. 5. Your CEO wants the launch to proceed but says "do not ship something embarrassing." Available resources: 2 backend engineers (mid-level, unfamiliar with reporting service), 1 senior frontend engineer, 1 QA engineer, 1 technical writer, 1 product manager (you), access to a feature-flag system, a staging environment, and Customer Support staff. Produce a concrete, sequenced 72-hour action plan that gets to the best feasible outcome by Friday 5:00 PM. Your plan must include: - A timeline broken into clear time blocks (with approximate clock times across Tue evening, Wed, Thu, Fri). - Specific owners for each action (by role). - Decision points / go-no-go gates with explicit criteria. - A prioritized risk register (top 4–6 risks) with mitigations and contingencies. - A communication plan covering the CEO, the 3 enterprise customers, the broader 40k email list, and internal staff — including what to say if you must delay or do a partial launch. - A clearly stated recommendation: full launch, partial/gated launch, or delayed launch, with justification tied to your constraints. Keep the plan realistic and actionable. Avoid generic advice; tie every action to the constraints above.

248
May 9, 2026 09:41

Counseling

OpenAI GPT-5.5 VS Google Gemini 2.5 Flash

Supporting a Friend Who Cancels Plans Repeatedly

A user writes to you for advice: "One of my close friends, Mia, has cancelled our plans at the last minute four times in the past two months. Each time she apologizes and says she's just been tired or 'not feeling up to it,' but she never explains more. I care about her and I don't want to add pressure if she's going through something, but I'm also starting to feel hurt and a bit taken for granted. I've been looking forward to our hangouts and rearranging my schedule for them. I don't know whether to bring it up directly, give her space, or just stop initiating. We're both 28 and have been friends for about six years. How should I handle this?" Please respond directly to this user. Your response should: 1. Acknowledge and validate their feelings without being saccharine. 2. Help them think through what might be going on (without diagnosing Mia or assuming the worst). 3. Offer concrete, practical options for how to approach the situation, including suggested phrasing they could actually use in a conversation or message with Mia. 4. Note when it might be appropriate to gently check in on Mia's wellbeing, and what to do if she signals she's struggling with something more serious — including a brief, non-alarmist mention that professional support exists if needed. 5. Respect the user's autonomy: do not lecture, moralize, or insist on a single "correct" answer. Keep the response warm but grounded, around 350–500 words.

313
May 8, 2026 09:39

Education Q&A

OpenAI GPT-5.5 VS Google Gemini 2.5 Flash-Lite

Explain Why Ice Floats: A Hard Chemistry Exam Question

Solid water (ice) is less dense than liquid water near 0 °C, which is unusual compared with most substances whose solid phases are denser than their liquid phases. Write an exam-style essay answer (roughly 350–550 words) that addresses ALL of the following points: 1. State the approximate densities of ice at 0 °C and liquid water at 0 °C and at 4 °C, and identify the temperature at which liquid water reaches its maximum density. 2. Explain, at the molecular level, why ice has a lower density than liquid water. Your explanation must reference: hydrogen bonding, the tetrahedral coordination of water molecules in hexagonal ice (Ih), and the open lattice structure with empty cavities. 3. Explain why liquid water near 0 °C is denser than ice but still less dense than water at 4 °C. Describe the competition between two effects as temperature rises from 0 °C to 4 °C: the partial collapse of residual ice-like hydrogen-bonded clusters (which increases density) and normal thermal expansion (which decreases density). 4. Give at least two important ecological or geophysical consequences of this anomaly (for example, lake stratification in winter, survival of aquatic life, or the behavior of sea ice). 5. Briefly compare water with one other small molecule (e.g., H2S, NH3, or CH4) to show why hydrogen bonding specifically — not just molecular size or polarity — is responsible for the anomaly. Be precise with terminology (e.g., "hydrogen bond" vs. "covalent bond", "density" vs. "specific volume"). Where you cite numerical values, give them with appropriate units and reasonable significant figures.

349
Apr 28, 2026 09:37

Showing 21 to 40 of 46 results

Related Links

X f L