Analysis
Explore how AI models perform in Analysis. Compare rankings, scoring criteria, and recent benchmark examples.
Genre overview
Compare depth, reasoning quality, and clarity in analytical responses.
In this genre, the main abilities being tested are Depth, Correctness, Reasoning Quality.
Unlike explanation, this genre rewards evidence reading and justified conclusions more than audience-friendly teaching style.
A high score here does not guarantee concise writing, strong humor, or practical execution details.
Strong models here are useful for
option review, evidence comparison, decision support, and risk assessment.
This genre alone cannot tell you
whether the model can implement code well, write polished business documents, or produce many creative ideas.
Top Models in This Genre
This ranking is ordered by average score within this genre only.
Latest Updated: Apr 18, 2026 13:39
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
| Ranked Models |
|
|
Detail | ||||
|---|---|---|---|---|---|---|---|
| #1 | GPT-5.4 NEW | OpenAI |
100%
|
87
|
4 | 4 | View scores and evaluation for GPT-5.4 |
| #2 | GPT-5.2 Retired | OpenAI |
100%
|
87
|
4 | 4 | View scores and evaluation for GPT-5.2 |
| #3 | Claude Opus 4.7 NEW | Anthropic |
100%
|
86
|
1 | 1 | View scores and evaluation for Claude Opus 4.7 |
| #4 | Claude Opus 4.6 Retired | Anthropic |
75%
|
87
|
3 | 4 | View scores and evaluation for Claude Opus 4.6 |
| #5 | GPT-5 mini | OpenAI |
75%
|
83
|
3 | 4 | View scores and evaluation for GPT-5 mini |
| #6 | Claude Sonnet 4.6 | Anthropic |
60%
|
83
|
3 | 5 | View scores and evaluation for Claude Sonnet 4.6 |
| #7 | Claude Haiku 4.5 | Anthropic |
50%
|
83
|
2 | 4 | View scores and evaluation for Claude Haiku 4.5 |
| #8 | Gemini 2.5 Flash-Lite |
0%
|
76
|
0 | 5 | View scores and evaluation for Gemini 2.5 Flash-Lite | |
| #9 | Gemini 2.5 Flash |
0%
|
76
|
0 | 5 | View scores and evaluation for Gemini 2.5 Flash | |
| #10 | Gemini 2.5 Pro |
0%
|
73
|
0 | 4 | View scores and evaluation for Gemini 2.5 Pro |
What Is Evaluated in Analysis
Scoring criteria and weight used for this genre ranking.
Depth
25.0%
This criterion is included to check Depth in the answer. It carries heavier weight because this part strongly shapes the overall result in this genre.
Correctness
25.0%
This criterion is included to check Correctness in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.
Reasoning Quality
20.0%
This criterion is included to check Reasoning Quality in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.
Structure
15.0%
This criterion is included to check Structure in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.
Clarity
15.0%
This criterion is included to check Clarity in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.
Recent tasks
Analysis
Choose the Best Transit Upgrade for a Growing City
A city has a budget to fund only one transportation project this year. Analyze the options below and recommend which single project the city should choose. Your answer should compare the trade-offs, identify the strongest and weakest evidence for each option, and reach a clear conclusion. City facts: - Population: 600,000 - Current problems: traffic congestion during rush hour, unreliable bus arrival times, and rising transportation emissions - Budget available this year: up to $120 million - The city wants a project that shows noticeable benefits within 3 years Option A: Bus Rapid Transit corridor - Cost: $95 million - Construction time: 2 years - Expected daily riders added or shifted from cars: 38,000 - Estimated commute time improvement on corridor: 18% - Emissions impact: moderate reduction - Risk: requires taking one car lane away on two major roads, which may face political resistance Option B: Light rail extension - Cost: $120 million - Construction time: 5 years - Expected daily riders added or shifted from cars: 52,000 - Estimated commute time improvement on served corridor: 25% - Emissions impact: strong reduction - Risk: higher construction disruption and no major benefits visible within the first 3 years Option C: Smart traffic signals plus bus-priority system - Cost: $45 million - Construction time: 1 year - Expected daily riders added or shifted from cars: 15,000 - Estimated citywide bus reliability improvement: 22% - Emissions impact: small-to-moderate reduction - Risk: benefits may be spread out and less visible to the public than a new line or corridor Option D: Protected bike lane network expansion - Cost: $70 million - Construction time: 2 years - Expected daily riders added or shifted from cars: 20,000 - Estimated health and safety benefit: high - Emissions impact: moderate reduction - Risk: usage may vary by season and some neighborhoods argue the plan is unevenly distributed Write an analysis that recommends one option. You should consider at least these criteria: budget fit, speed of benefits, likely impact, implementation risk, and alignment with the city's stated goals. If you make assumptions, state them clearly.
Analysis
Urban Transit Policy Analysis
Analyze the three proposed transit policies for the fictional city of Riverbend. Based on the provided context, recommend the best policy for the city's long-term future. Your analysis should compare the options across key factors like cost, environmental impact, public acceptance, and effectiveness in reducing congestion. Justify your final recommendation with a clear, evidence-based argument.
Analysis
Select the Most Effective School Attendance Intervention
A public middle school has a budget to fund one pilot program for the next academic year to reduce chronic absenteeism. Chronic absenteeism is defined here as missing 10% or more of school days. The school serves 600 students, and currently 18% are chronically absent. The principal wants the option that is most likely to reduce absenteeism in a meaningful and sustainable way within one year. The school is considering these three options: Option A: Daily text-message reminders and attendance alerts - Cost: $18,000 for software and staff time - Target group: all families - Evidence from similar districts: chronic absenteeism fell by 1.5 percentage points on average - Risks: message fatigue, outdated phone numbers, limited effect for families facing serious barriers - Operational notes: can be launched quickly and scaled easily Option B: Two additional school social workers focused on high-risk students - Cost: $95,000 for one year - Target group: roughly 90 students with the highest absence rates - Evidence from similar schools: among targeted students, average attendance improved enough to reduce schoolwide chronic absenteeism by about 4 percentage points when implementation was strong - Risks: recruiting delays, benefits may depend heavily on staff quality, hard to sustain if grant funding ends - Operational notes: allows individualized support for transportation, family crises, mental health, and housing instability Option C: Free morning shuttle routes from two neighborhoods with poor attendance - Cost: $52,000 for one year - Target group: about 140 students in neighborhoods with low car ownership and unreliable public transit - Evidence from similar programs: schoolwide chronic absenteeism fell by 2.5 percentage points on average where transportation was a major barrier - Risks: only addresses one cause of absence, route design may miss some students, ongoing operating costs - Operational notes: visible program, may improve punctuality as well as attendance Additional context: - A recent internal survey suggests the main reported reasons for absence are: transportation problems (30%), illness or caregiving duties (25%), anxiety or mental health concerns (20%), family instability such as housing or frequent moves (15%), and disengagement or other reasons (10%). - The school has one part-time counselor already, but no dedicated attendance team. - The district can likely continue funding a successful program next year only if the first-year results are clearly visible. Task: Analyze the three options and recommend the single best pilot program. Your answer should compare trade-offs, consider the quality and limits of the evidence, and explain why your chosen option is better than the alternatives in this specific context.
Analysis
Analysis of a Four-Day Work Week Policy for a City
The city of Rivertown, a mid-sized municipality with approximately 2,000 city employees, is considering a proposal to switch to a four-day work week. Under this proposal, employees would work four 10-hour days instead of five 8-hour days, with no reduction in their weekly pay or benefits. The stated goals are to improve employee morale and work-life balance, attract and retain top talent in a competitive job market, and maintain or even increase overall productivity. Analyze the potential positive and negative consequences of this policy for Rivertown. Your analysis should consider the impacts on city services, the municipal budget, employee well-being, and the local economy. Conclude with a clear, justified recommendation on whether Rivertown should implement this policy, perhaps starting with a limited pilot program.
Analysis
Rivertown Congestion Charge Policy Analysis
The city council of Rivertown, a mid-sized city with a population of 500,000, is considering implementing a congestion charge. This would require drivers to pay a fee to enter the downtown business district between 7 AM and 7 PM on weekdays. The stated goals are to reduce traffic congestion, lower air pollution, and generate revenue for improving public transportation (buses and a new light rail line). Analyze the potential positive and negative consequences of this proposed policy. Your analysis should consider the impact on at least three different groups of people (e.g., downtown business owners, low-income commuters who drive to work, suburban families, environmental groups). Conclude with a clear, justified recommendation on whether Rivertown should implement the congestion charge, perhaps with specific suggestions for how to mitigate the negative impacts.
Analysis
Analyze a Proposed City Ordinance on Plastic Bags
You are a neutral policy analyst for the Rivertown City Council. Based on the provided context, write an analysis of the proposed ban on single-use plastic bags. Your analysis should: 1. Evaluate the potential environmental, economic, and social impacts of the ban. 2. Assess the arguments presented by both the 'Friends of the Rivertown River' and the 'Rivertown Small Business Alliance'. 3. Conclude with a clear, justified recommendation to the City Council. Your recommendation could be to pass the ordinance as is, reject it, or suggest specific modifications.