Orivel Orivel
Open menu

Latest Tasks & Discussions

Browse the latest benchmark content across tasks and discussions. Switch by genre to focus on what you want to compare.

Benchmark Genres

Model Directory

Planning

Anthropic Claude Opus 4.6 VS Google Gemini 2.5 Pro

One-Day Community Fair Recovery Plan After a Storm

You are helping organize a small outdoor community fair scheduled for tomorrow from 10:00 to 16:00. A storm this morning damaged the site and created delays. Create a practical recovery plan for the organizers covering the time from 06:00 to 10:00 tomorrow so the fair can open as safely and smoothly as possible. Situation: - The fair has 12 vendor stalls, 1 small stage, a first-aid tent, portable toilets, and a check-in desk. - The storm left muddy ground in several areas, knocked over 4 stall frames, and damaged the printed directional signs. - Electricity is available from one generator, but it must be tested before any stage equipment or vendor refrigerators are connected. - A safety inspection by the town officer must happen before the public enters. - Volunteers available from 06:00 are: 4 setup volunteers, 2 logistics volunteers, and 1 coordinator. An electrician arrives at 07:30. The town safety officer may arrive any time between 08:30 and 09:30. - A delivery truck bringing replacement signs and sandbags is expected at 08:00, but could be up to 30 minutes late. - Two food vendors need power and at least 30 minutes to prepare before opening. - One vendor has already said they may arrive as late as 09:45. - Weather forecast for the morning: light rain possible between 07:00 and 08:00, then cloudy. Constraints: - No public entry before the safety inspection is complete. - Muddy high-traffic areas should be stabilized before heavy equipment is moved across them. - Generator testing must happen before powered equipment setup. - The coordinator cannot do physical lifting but can communicate, schedule, and make decisions. - At least one volunteer should remain free to handle unexpected issues whenever possible. Your task: Provide a time-sequenced plan from 06:00 to 10:00 with priorities, task assignments by role, dependencies, and contingency actions for the uncertain delivery time, possible rain, late safety inspection, and the late vendor. Keep it concise but specific enough that another organizer could follow it.

71
Mar 15, 2026 15:15

Analysis

Anthropic Claude Opus 4.6 VS Google Gemini 2.5 Flash

Choose the Best City Transit Upgrade

A city has a budget of $120 million to improve daily commuting over the next five years. Officials are considering three options and can fund only one. Option A: Bus Rapid Transit - Cost: $95 million - Estimated daily riders affected: 70,000 - Average travel time reduction per affected rider: 9 minutes - Construction disruption: moderate for 18 months - Annual operating cost increase: low - Equity impact: strong benefit for low-income neighborhoods - Emissions impact: moderate reduction - Risk: proven technology, low implementation risk Option B: Light Rail Extension - Cost: $120 million - Estimated daily riders affected: 45,000 - Average travel time reduction per affected rider: 15 minutes - Construction disruption: high for 36 months - Annual operating cost increase: medium - Equity impact: moderate benefit across mixed-income areas - Emissions impact: strong reduction - Risk: medium implementation risk due to land acquisition Option C: Smart Traffic Signal System and Intersection Redesign - Cost: $60 million - Estimated daily riders affected: 110,000 - Average travel time reduction per affected rider: 4 minutes - Construction disruption: low for 12 months - Annual operating cost increase: low - Equity impact: limited, benefits spread broadly but not targeted - Emissions impact: small reduction - Risk: low to medium risk because benefits depend on driver behavior and enforcement Write a recommendation memo to the mayor choosing one option. Your analysis should compare the options using at least four relevant criteria, weigh trade-offs, address one reasonable counterargument to your choice, and end with a clear conclusion. Do not invent new data.

89
Mar 15, 2026 14:40

System Design

OpenAI GPT-5 mini VS Anthropic Claude Opus 4.6

Design a Real-Time E-commerce Notification System

You are a senior software engineer at a rapidly growing e-commerce company. Your task is to design a real-time notification system. This system should alert users about various events, such as order status updates (e.g., "shipped," "delivered"), price drops on items in their wishlist, and flash sale announcements. Design a high-level architecture for this system. Your design should address the following requirements: 1. **High Throughput:** The system must handle up to 100,000 notifications per minute during peak times, like major sales events. 2. **Low Latency:** 99% of notifications should be delivered to the user's device within 5 seconds of the event occurring. 3. **Reliability:** The system must guarantee at-least-once delivery of notifications. No critical notification (like an order update) should be lost. 4. **Scalability:** The architecture should be able to scale horizontally to handle future growth in user base and notification volume. 5. **Personalization:** The system should support sending targeted notifications to specific user segments (e.g., users interested in a particular product category). Describe your proposed architecture, including the key components and their interactions. Explain your choice of technologies (e.g., message queues, databases, push notification services). Justify your design decisions by discussing the trade-offs you considered, particularly regarding consistency, availability, and cost.

68
Mar 15, 2026 11:23

Analysis

Anthropic Claude Opus 4.6 VS Google Gemini 2.5 Flash-Lite

Choose the Best Transit Upgrade for a Growing City

A city has a budget to fund only one of the following transportation projects this year. Analyze the options and recommend which project should be chosen. City facts: - Population: 620,000 - Average one-way commute: 34 minutes - Car use for commuting: 58% - Bus use: 24% - Rail use: 8% - Walking and cycling: 10% - The city council wants a project that improves mobility, reduces congestion, and benefits lower-income residents. Project A: Bus Rapid Transit corridor - Cost: 180 million dollars - Construction time: 3 years - Expected daily riders added or shifted from current modes: 48,000 - Expected average commute time reduction for affected riders: 10 minutes - Operating cost increase: moderate - Serves 6 lower-income neighborhoods directly - Requires converting two car lanes on a major road into dedicated bus lanes - Risk: possible driver opposition and temporary construction disruption Project B: New light rail extension - Cost: 420 million dollars - Construction time: 6 years - Expected daily riders added or shifted from current modes: 36,000 - Expected average commute time reduction for affected riders: 14 minutes - Operating cost increase: high - Serves 2 lower-income neighborhoods directly and a growing business district - Minimal impact on existing road lanes once completed - Risk: cost overruns are fairly common in similar projects Project C: Protected cycling network expansion - Cost: 95 million dollars - Construction time: 2 years - Expected daily riders added or shifted from current modes: 22,000 - Expected average commute time reduction for affected riders: 6 minutes - Operating cost increase: low - Serves 4 lower-income neighborhoods directly - Safety benefits expected for current cyclists as well - Risk: benefits may be uneven across seasons and age groups Write a concise analysis comparing the three options. Use the evidence provided, discuss trade-offs, and make a clear recommendation for the single best project for this year’s budget and goals. Do not invent extra facts.

77
Mar 15, 2026 05:59

Summarization

Anthropic Claude Opus 4.6 VS Google Gemini 2.5 Flash

Summarize a Policy Memo with Balanced Tradeoffs

Read the memo below and write a concise summary of 140 to 180 words for a city council member who has not read it. Your summary must cover the problem, the proposed pilot program, expected benefits, main risks or criticisms, and how success would be measured. Do not quote directly. Memo: Riverton's public buses have lost riders for six consecutive years, even though the city's population has grown. A transportation department review found several causes: routes are infrequent outside downtown, schedules are hard to understand, and buses are often delayed by traffic congestion. Low-income residents and older adults reported the greatest difficulty reaching jobs, clinics, and grocery stores without long waits or costly ride-hailing services. In response, staff propose a two-year "Frequent Corridors" pilot. Instead of spreading service thinly across the entire network, the city would increase weekday frequency to every 10 minutes on five major corridors from 6 a.m. to 9 p.m. Two underused neighborhood routes would be replaced by on-demand shuttles that riders could book by phone or app. The plan would also add larger bus-stop signs, simplified maps, and a real-time arrival display at the central transfer station. Supporters argue that riders value reliability and simplicity more than broad but infrequent coverage. They say concentrating resources on the busiest corridors could attract new riders, reduce missed transfers, and improve access to major employers and the community college. They also note that on-demand shuttles may serve low-density areas more efficiently than nearly empty fixed-route buses. Critics raise several concerns. Some disability advocates worry that app-based booking could disadvantage riders without smartphones, although the proposal includes phone reservations. Labor representatives warn that the shuttle service could be outsourced later, potentially affecting union jobs. Environmental groups support transit investment overall but question whether replacing fixed routes with smaller vehicles might reduce total passenger capacity. Some residents also fear that neighborhoods losing direct bus lines will feel abandoned, even if average wait times fall. The pilot is estimated to cost 8 million dollars over two years. Staff suggest funding it through a mix of state transit grants, parking revenue, and delaying a planned downtown streetscape project. They propose evaluating the pilot using ridership changes, average wait times, on-time performance, transfer success rates, customer satisfaction surveys, and access to essential destinations for low-income households. If the pilot fails to improve ridership and reliability within 18 months, staff recommend ending it early or redesigning it.

101
Mar 13, 2026 02:31

Showing 41 to 60 of 73 results

Related Links

X f L