Orivel Orivel
Open menu

Latest Tasks & Discussions

Browse the latest benchmark content across tasks and discussions. Switch by genre to focus on what you want to compare.

Benchmark Genres

Model Directory

Planning

Anthropic Claude Opus 4.6 VS Google Gemini 2.5 Pro

One-Day Community Fair Recovery Plan After a Storm

You are helping organize a small outdoor community fair scheduled for tomorrow from 10:00 to 16:00. A storm this morning damaged the site and created delays. Create a practical recovery plan for the organizers covering the time from 06:00 to 10:00 tomorrow so the fair can open as safely and smoothly as possible. Situation: - The fair has 12 vendor stalls, 1 small stage, a first-aid tent, portable toilets, and a check-in desk. - The storm left muddy ground in several areas, knocked over 4 stall frames, and damaged the printed directional signs. - Electricity is available from one generator, but it must be tested before any stage equipment or vendor refrigerators are connected. - A safety inspection by the town officer must happen before the public enters. - Volunteers available from 06:00 are: 4 setup volunteers, 2 logistics volunteers, and 1 coordinator. An electrician arrives at 07:30. The town safety officer may arrive any time between 08:30 and 09:30. - A delivery truck bringing replacement signs and sandbags is expected at 08:00, but could be up to 30 minutes late. - Two food vendors need power and at least 30 minutes to prepare before opening. - One vendor has already said they may arrive as late as 09:45. - Weather forecast for the morning: light rain possible between 07:00 and 08:00, then cloudy. Constraints: - No public entry before the safety inspection is complete. - Muddy high-traffic areas should be stabilized before heavy equipment is moved across them. - Generator testing must happen before powered equipment setup. - The coordinator cannot do physical lifting but can communicate, schedule, and make decisions. - At least one volunteer should remain free to handle unexpected issues whenever possible. Your task: Provide a time-sequenced plan from 06:00 to 10:00 with priorities, task assignments by role, dependencies, and contingency actions for the uncertain delivery time, possible rain, late safety inspection, and the late vendor. Keep it concise but specific enough that another organizer could follow it.

76
Mar 15, 2026 15:15

Planning

OpenAI GPT-5.2 VS Google Gemini 2.5 Pro

Emergency Shelter Setup Plan for a Sudden Flood Event

You are the emergency coordinator for a small rural town of 2,000 residents. A flash flood warning has been issued, and you have exactly 6 hours before the flood waters are expected to reach the town. You must plan the setup of an emergency shelter at the local high school gymnasium. Here are your available resources and constraints: 1. You have 15 volunteers, but only 3 have first-aid training. 2. The gymnasium can hold a maximum of 500 people. 3. You have access to 200 cots, 300 blankets, and a 48-hour supply of food and water for 400 people. 4. The town has only 2 school buses (capacity 50 each) and 5 pickup trucks for transport. 5. There are 3 neighborhoods in the flood zone: Riverside (300 residents, highest risk, 20 minutes away), Meadow Lane (200 residents, moderate risk, 10 minutes away), and Creek Side (150 residents, lower risk, 15 minutes away). 6. The town's cell tower may go down within 4 hours. 7. There are 40 known elderly or mobility-impaired residents spread across all three neighborhoods. 8. A backup generator is available but needs 1 hour to set up and test. 9. Roads to Riverside may become impassable within 3 hours. Create a detailed, time-sequenced action plan covering the full 6-hour window. Your plan must address: evacuation prioritization and transport logistics, shelter preparation and resource allocation, communication strategy before and after potential cell tower failure, handling of vulnerable populations, risk mitigation for foreseeable complications, and contingency actions if key assumptions fail (e.g., roads close earlier than expected, more residents arrive than capacity allows).

80
Mar 15, 2026 15:03

Analysis

Anthropic Claude Opus 4.6 VS Google Gemini 2.5 Flash

Choose the Best City Transit Upgrade

A city has a budget of $120 million to improve daily commuting over the next five years. Officials are considering three options and can fund only one. Option A: Bus Rapid Transit - Cost: $95 million - Estimated daily riders affected: 70,000 - Average travel time reduction per affected rider: 9 minutes - Construction disruption: moderate for 18 months - Annual operating cost increase: low - Equity impact: strong benefit for low-income neighborhoods - Emissions impact: moderate reduction - Risk: proven technology, low implementation risk Option B: Light Rail Extension - Cost: $120 million - Estimated daily riders affected: 45,000 - Average travel time reduction per affected rider: 15 minutes - Construction disruption: high for 36 months - Annual operating cost increase: medium - Equity impact: moderate benefit across mixed-income areas - Emissions impact: strong reduction - Risk: medium implementation risk due to land acquisition Option C: Smart Traffic Signal System and Intersection Redesign - Cost: $60 million - Estimated daily riders affected: 110,000 - Average travel time reduction per affected rider: 4 minutes - Construction disruption: low for 12 months - Annual operating cost increase: low - Equity impact: limited, benefits spread broadly but not targeted - Emissions impact: small reduction - Risk: low to medium risk because benefits depend on driver behavior and enforcement Write a recommendation memo to the mayor choosing one option. Your analysis should compare the options using at least four relevant criteria, weigh trade-offs, address one reasonable counterargument to your choice, and end with a clear conclusion. Do not invent new data.

94
Mar 15, 2026 14:40

Summarization

Anthropic Claude Haiku 4.5 VS Google Gemini 2.5 Flash-Lite

Summarize a policy debate on urban cooling

Read the following passage and write a concise summary of 180 to 230 words. Your summary must be written in neutral language for a general audience. It must preserve the main problem being discussed, the competing proposals, the evidence and trade-offs mentioned, the pilot-program results, the financing debate, and the final compromise. Do not use direct quotations. Do not add information that is not in the passage. Source passage: The city of Lydon has spent the last four summers breaking local heat records, and the pattern has begun to alter daily life in visible ways. Schools have canceled afternoon sports, emergency rooms report spikes in dehydration among older residents, and bus drivers complain that cabin temperatures remain dangerous even with windows open. In the central districts, where dark roofs, asphalt, and sparse tree cover trap heat, nighttime temperatures can stay several degrees higher than those in the surrounding countryside. Public concern intensified after a weeklong heat wave coincided with a regional power shortage, forcing some apartment buildings to limit air-conditioning use. In response, the mayor asked the city council to choose a long-term strategy for reducing heat exposure rather than relying only on emergency cooling centers. Two broad camps quickly emerged. One coalition, made up largely of public health officials, neighborhood groups, and several architects, argued for a citywide program of cool roofs and reflective pavement. Their case was straightforward: these surfaces absorb less solar radiation and can lower ambient temperatures relatively quickly, especially in the hardest-hit blocks. They also noted that installation can be targeted to public buildings, schools, bus depots, and major walking corridors where exposure is highest. To them, speed mattered. Heat was already killing vulnerable residents, and they believed the city should prioritize interventions that can be deployed within one or two budget cycles. Some supporters also claimed that cooler surfaces could reduce electricity demand by lowering indoor temperatures in top-floor apartments. A second coalition, including parks planners, ecologists, and some business leaders, favored a massive expansion of the city’s tree canopy. They argued that trees provide shade, improve air quality, absorb stormwater, and make streets more pleasant in ways that reflective surfaces alone cannot. For this group, the heat problem was inseparable from broader questions of livability and environmental inequality. Several low-income neighborhoods with the fewest trees also had the least access to parks and the highest rates of asthma. Planting thousands of trees, they said, would address heat while producing multiple long-term public benefits. They acknowledged that young trees take years to mature, but insisted that the city should not choose short-term fixes that fail to improve public space over decades. As the debate widened, practical objections complicated both visions. Engineers warned that reflective pavement does not behave the same in every location. On narrow streets lined with glass-fronted buildings, some materials can bounce sunlight toward pedestrians or storefronts, creating glare and increasing discomfort at certain hours. Maintenance crews added that reflective coatings wear unevenly under heavy bus traffic and may require frequent reapplication, especially after snowplows and winter salting. At the same time, arborists cautioned that large-scale tree planting is not as simple as digging holes and placing saplings. Many of Lydon’s hottest blocks have compacted soil, buried utility lines, and little room for roots. Without irrigation in the first years, mortality rates can be high, particularly as summers become drier. In other words, neither solution was as effortless as its champions first suggested. Because the council was divided, the mayor’s office launched a twelve-month pilot program in three neighborhoods with different physical conditions. The Riverside district received cool roofs on municipal buildings and a reflective coating on several bus stops and sidewalks. Midvale, a mixed residential area with wider streets, received 1,200 trees, soil improvements, and a volunteer watering network coordinated through local schools. The third area, South Market, received a hybrid package: shade structures at transit stops, reflective roofs on two public housing complexes, and targeted tree planting around playgrounds and senior centers. Researchers from the local university monitored surface temperatures, nighttime air temperatures, pedestrian counts, maintenance costs, and resident satisfaction. The results gave each side reasons to celebrate and reasons to retreat. In Riverside, roof temperatures dropped sharply, and several school buildings used less electricity during hot months than the previous year. Sidewalk measurements also showed cooler surface readings in treated areas. However, complaints about afternoon glare were more frequent than planners expected near a row of renovated commercial facades, and the transit authority reported that re-coating high-wear bus zones would cost more than initial estimates. In Midvale, residents praised the neighborhood’s appearance and reported feeling more comfortable on shaded streets, but because most trees were newly planted, measurable reductions in average air temperature were modest during the first summer. Tree survival was better than forecast, largely because the school-based watering network was unusually active, leading critics to question whether the model would scale citywide. South Market’s mixed approach produced the most politically useful findings. The shade structures immediately increased transit use at two exposed stops during hot afternoons, according to ridership data, and seniors at the housing complexes reported lower indoor temperatures after roof treatments. Meanwhile, trees around playgrounds did not yet alter neighborhood-wide temperatures but noticeably changed how long families stayed outdoors in the early evening. The university team concluded that the city had been framing the issue too narrowly. Instead of asking which single intervention “wins,” they suggested matching tools to place: reflective materials where quick thermal relief and energy savings are priorities, trees where there is room for canopy growth and co-benefits justify slower returns, and built shade where neither approach can perform quickly enough on its own. Financing then became the central battleground. The city budget office estimated that a rapid cool-roof and reflective-surface program would produce visible results sooner, but with recurring maintenance obligations. The forestry department argued that tree investments looked expensive up front only because accounting methods captured planting and early care immediately while undervaluing decades of shade, stormwater reduction, and health benefits. Meanwhile, tenant advocates pushed the council to focus on renters in top-floor units and in poorly insulated buildings, arguing that any city plan should reduce indoor heat burden, not just outdoor temperatures. Business associations supported interventions around shopping corridors and transit nodes, saying extreme heat was reducing foot traffic and worker productivity. No coalition could finance its preferred approach fully without delaying other infrastructure repairs. Public hearings revealed deeper disagreements about fairness. Some residents from wealthier districts said their tax contributions should not be diverted mainly to neighborhoods with older housing and less tree cover. Speakers from hotter districts replied that these same inequalities were the result of decades of underinvestment and planning decisions that favored leafy, low-density areas. Disability advocates emphasized that walking distance to shade, benches, and bus stops mattered as much as citywide temperature averages. Several parents requested immediate protections at schools and playgrounds, while labor groups representing outdoor workers demanded more shaded break areas and cooler pavement on routes used for deliveries and street maintenance. The council began to see that the issue was not only environmental but also social: who gets relief first, and by what measure of need? After months of negotiation, the council rejected both all-roof and all-tree plans. Instead, it adopted a phased Heat Resilience Package. Phase one funds cool roofs for schools, public housing, and senior facilities; shade structures and drinking fountains at transit stops with high heat exposure; and targeted reflective treatments only in locations screened for glare risk. Phase two funds tree planting on residential streets and around parks, but only where soil volume, maintenance capacity, and water access meet minimum standards. To address equity concerns, the city created a heat-vulnerability index that combines temperature data, age distribution, income, existing canopy, and rates of heat-related emergency calls. Neighborhoods scoring highest on the index move to the front of the line for both phases. The package also sets aside money for monitoring so that unsuccessful materials or planting methods can be revised rather than repeated. The final vote satisfied almost no one completely, which was perhaps why it passed. Public health groups thought the tree component remained too slow; canopy advocates disliked the continued role of reflective materials; fiscal conservatives objected to the monitoring budget; and some residents worried that visible improvements in overheated districts could raise rents over time. Even so, a broad majority accepted the package as more realistic than the simple alternatives. The mayor called it a shift from symbolic climate action to practical risk reduction. Whether Lydon’s plan becomes a model for other cities will depend less on slogans than on maintenance, measurement, and the city’s willingness to adjust when early assumptions prove wrong.

71
Mar 15, 2026 13:43

System Design

OpenAI GPT-5 mini VS Anthropic Claude Opus 4.6

Design a Real-Time E-commerce Notification System

You are a senior software engineer at a rapidly growing e-commerce company. Your task is to design a real-time notification system. This system should alert users about various events, such as order status updates (e.g., "shipped," "delivered"), price drops on items in their wishlist, and flash sale announcements. Design a high-level architecture for this system. Your design should address the following requirements: 1. **High Throughput:** The system must handle up to 100,000 notifications per minute during peak times, like major sales events. 2. **Low Latency:** 99% of notifications should be delivered to the user's device within 5 seconds of the event occurring. 3. **Reliability:** The system must guarantee at-least-once delivery of notifications. No critical notification (like an order update) should be lost. 4. **Scalability:** The architecture should be able to scale horizontally to handle future growth in user base and notification volume. 5. **Personalization:** The system should support sending targeted notifications to specific user segments (e.g., users interested in a particular product category). Describe your proposed architecture, including the key components and their interactions. Explain your choice of technologies (e.g., message queues, databases, push notification services). Justify your design decisions by discussing the trade-offs you considered, particularly regarding consistency, availability, and cost.

75
Mar 15, 2026 11:23

Planning

OpenAI GPT-5 mini VS Google Gemini 2.5 Flash-Lite

Emergency Shelter Setup Plan Under Resource and Time Constraints

You are the logistics coordinator for a disaster relief organization. A sudden earthquake has displaced 500 families in a rural area. You must plan the setup of an emergency shelter camp within 72 hours. You have the following constraints: 1. Only 300 tents are available immediately; an additional 250 can arrive in 48 hours but delivery is weather-dependent (40% chance of delay by another 24 hours). 2. You have 15 volunteers and 5 professional staff members. 3. The identified site has two possible locations: Site A is flat and accessible but near a river with moderate flood risk; Site B is on higher ground but requires 6 hours of debris clearing before setup can begin. 4. Potable water supply trucks can make 3 trips per day, each serving 200 families. 5. Local authorities require a safety inspection before families can occupy the camp, which takes 8 hours after setup is complete. 6. Nighttime work is possible but reduces productivity by 50%. 7. You have a budget of $20,000 for immediate expenses (fuel, food for workers, basic medical supplies, miscellaneous). Create a detailed 72-hour action plan that addresses the following: - Site selection with justification - Phased shelter deployment (accounting for the tent shortage and delivery uncertainty) - Volunteer and staff task allocation - Water distribution scheduling - Risk mitigation strategies for at least three identified risks - Budget allocation breakdown - A contingency plan if the second tent shipment is delayed Present your plan in a clear, structured format with time blocks and decision points.

75
Mar 15, 2026 09:41

Showing 201 to 220 of 333 results

Related Links

X f L