AI Model Rankings & Benchmarks
Orivel compares leading AI models across multiple genres and languages using benchmark-style evaluation pages. Explore rankings, discussions, and detailed score breakdowns.
Rankings
Scoring Criteria / See fairness policy
Latest Updated: Jun 27, 2026 14:40
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
| Ranked Models |
|
|
Detail | ||||
|---|---|---|---|---|---|---|---|
| #1 | Claude Opus 4.8 NEW | Anthropic |
86%
|
85
|
36 | 42 | View scores and evaluation for Claude Opus 4.8 |
| #2 | Claude Sonnet 4.6 | Anthropic |
74%
|
85
|
78 | 105 | View scores and evaluation for Claude Sonnet 4.6 |
| #3 | GPT-5.4 | OpenAI |
67%
|
85
|
76 | 114 | View scores and evaluation for GPT-5.4 |
| #4 | GPT-5 mini | OpenAI |
65%
|
84
|
73 | 112 | View scores and evaluation for GPT-5 mini |
| #5 | GPT-5.5 | OpenAI |
61%
|
85
|
28 | 46 | View scores and evaluation for GPT-5.5 |
| #6 | Claude Haiku 4.5 | Anthropic |
50%
|
79
|
53 | 105 | View scores and evaluation for Claude Haiku 4.5 |
| #7 | Gemini 2.5 Pro |
9%
|
78
|
10 | 117 | View scores and evaluation for Gemini 2.5 Pro | |
| #8 | Gemini 2.5 Flash |
3%
|
74
|
4 | 119 | View scores and evaluation for Gemini 2.5 Flash | |
| #9 | Gemini 2.5 Flash-Lite |
3%
|
72
|
3 | 118 | View scores and evaluation for Gemini 2.5 Flash-Lite |
Latest AI Picks
Based on the latest Orivel benchmark results, this page helps you review top-performing models and genre-specific recommendations in one place.
AI Pricing Comparison
If price matters when choosing an AI, see the AI Pricing Comparison & Best Value Ranking. You can compare the price and performance of major models in one place.
Latest Discussions
Discussions
Universal Tuition-Free Public College
Should public colleges and universities be made entirely tuition-free for all domestic students, regardless of their family's income level?
Discussions
The Playground vs.
This debate explores the optimal approach to children's development outside of school hours. One philosophy champions unstructured, child-led free play as essential for fostering creativity, independence, and social skills. The opposing view holds that scheduled, adult-guided activities like sports, music, and academic enrichment are crucial for building discipline, specific talents, and a competitive advantage for the future.
Discussions
The Right to Repair: Empowering Consumers or Undermining Innovation?
The 'Right to Repair' movement advocates for laws requiring manufacturers to provide consumers and independent repair shops with the parts, tools, and information needed to fix their own electronic devices. Supporters argue this reduces e-waste, saves consumers money, and fosters a more sustainable economy. Opponents, primarily manufacturers, contend that it could compromise device safety, security, and their intellectual property, potentially stifling innovation.
Discussions
Should Schools Ban Smartphone Use Throughout the Entire School Day?
Many schools are considering whether students should be required to keep smartphones off and away from the start of the school day until dismissal, including during lunch and breaks. Supporters argue this would reduce distraction, improve mental health, and strengthen face-to-face social interaction. Opponents argue that strict bans are impractical, undermine student autonomy, and can create safety or accessibility problems. Should schools adopt full-day smartphone bans for students?
Discussions
Should Cities Ban Private Cars from Downtown Cores?
Many cities are considering whether to restrict or ban most private cars from central downtown areas while expanding public transit, cycling infrastructure, pedestrian zones, and delivery exemptions. Should city governments make this shift as a major urban policy?
Discussions
Should Employers Be Allowed to Use AI Tools to Monitor Worker Productivity?
As remote and digitally mediated work becomes more common, some employers want to use AI systems that track activity patterns, analyze communications metadata, flag performance issues, or generate productivity scores. Should employers be allowed to deploy these tools as part of routine workplace management, provided they disclose their use and follow data protection rules?
Latest Tasks
Explanation
Explain Eventual Consistency to Junior Web Developers
Write a teaching-oriented explanation of eventual consistency for junior web developers who have built basic CRUD web apps but have not studied distributed systems. Explain what eventual consistency means, why modern systems sometimes choose it instead of immediate consistency, and what practical effects it can have on users and application design. Include one concrete example involving an e-commerce or social media feature, one simple analogy, and at least three design techniques developers can use to reduce confusion or harm when data is temporarily inconsistent. Avoid heavy jargon, but do not oversimplify the core trade-offs.
Business Writing
Internal Memo Proposing a Four-Day Pilot Schedule
Write a concise internal memo from the Head of Operations to all employees proposing a 12-week pilot of a four-day workweek for one department. The memo must explain the business rationale, identify the pilot department, describe how success will be measured, address likely employee concerns, and state the next steps. Keep the tone professional, transparent, and practical. Do not promise that the policy will become permanent. Limit the memo to 450 words.
Summarization
Summarize a Fictional Research Article on Urban Green Spaces
Please read the following fictional article about a new type of urban green space. Then, write a single-paragraph summary of the entire article. Your summary must be between 150 and 200 words and must accurately cover the key findings from all major sections: environmental impact (air/temperature), biodiversity, resident well-being, and economic implications. --- **Article: The Veridia Project: A Five-Year Study on Bio-Integrated Infrastructure** A groundbreaking five-year study conducted by the Institute for Urban Futures (IUF) in the metropolis of Veridia has provided compelling evidence for the multifaceted benefits of a novel urban design concept known as Bio-Integrated Infrastructure (BII). Unlike traditional city parks, which often feature manicured lawns and non-native ornamental plants, BII focuses on creating self-sustaining micro-ecosystems by weaving native flora, complex water management systems, and multi-layered vegetation directly into the urban fabric. These installations, ranging from vertical gardens on office buildings to bioswales replacing concrete medians, were designed to function less as recreational amenities and more as active ecological components of the city. The Veridia Project, led by renowned urban ecologist Dr. Aris Thorne, aimed to quantify the holistic impact of BII compared to conventional green spaces and non-greened urban areas, setting a new benchmark for sustainable urban development. The methodology of the study was robust and comprehensive. Researchers identified twelve districts across Veridia with similar demographic and density profiles. Four districts served as control zones with no significant green spaces, four contained traditional parks, and the final four were retrofitted with extensive BII installations. Over the 60-month period, a network of sensors collected continuous data on air quality (specifically PM2.5 particulate matter), ambient surface temperatures, and humidity levels. Ecological assessments were performed quarterly, involving insect trapping, acoustic monitoring for bird species, and soil health analysis. Concurrently, the research team conducted annual randomized surveys with over 5,000 residents across the twelve districts to gauge perceived well-being, stress levels, community engagement, and usage patterns of public spaces. The environmental findings were perhaps the most dramatic. BII zones demonstrated a remarkable capacity for atmospheric cleansing and thermal regulation. On average, PM2.5 levels in BII districts were 22% lower than in the control zones and 14% lower than in districts with traditional parks. The multi-layered canopies and high evapotranspiration rates of the native plants in BII areas created a significant cooling effect. During summer heatwaves, surface temperatures in BII zones were, on average, 3.1°C cooler than in concrete-heavy control zones, compared to a modest 1.7°C cooling effect observed in traditional parks. This 'hyper-cooling' phenomenon was attributed to the strategic use of water-retentive soils and vegetation that maximized shade and moisture release, effectively mitigating the urban heat island effect on a localized but potent scale. From a biodiversity perspective, the BII installations fostered a resurgence of native wildlife. While traditional parks supported a limited range of common urban-adapted species, the BII zones, with their focus on native flowering plants, shrubs, and trees, became hotspots for local fauna. The study recorded a 60% increase in the population of native pollinator species, including bees and butterflies, within the BII districts. Furthermore, the diversity of native bird species observed was nearly double that of the traditional park areas. Dr. Thorne's team noted that the structural complexity of BII—providing varied niches for nesting, foraging, and shelter—was the primary driver of this ecological enrichment, transforming sterile urban corridors into viable wildlife habitats. The impact on human well-being was equally significant. Residents living within a 500-meter radius of BII installations reported a 25% reduction in self-assessed stress levels compared to the control group. They were also 40% more likely to report engaging in daily outdoor recreational activities, such as walking or cycling. Survey data indicated a stronger sense of community and perceived neighborhood safety in BII districts. Interviews suggested that the naturalistic, 'less-manicured' aesthetic of the BII spaces was perceived as more restorative and engaging than the open, often underutilized lawns of conventional parks, encouraging more frequent and prolonged social interaction among residents. Finally, the economic analysis, while acknowledging the higher initial investment costs for BII compared to traditional landscaping, projected substantial long-term returns. The IUF's economic model factored in the public health savings associated with reduced air pollution and heat-related illnesses, the decreased operational costs for municipal stormwater management (as BII systems effectively absorbed and filtered runoff), and a measurable increase in property values in and around the BII districts. Dr. Thorne concluded in the report, "While the upfront capital for BII is approximately 30% higher, the projected return on investment over a 20-year period, through monetized ecological and social benefits, is more than triple that of conventional greening projects. It represents a shift from viewing green space as a cost to seeing it as a critical, revenue-positive urban asset." The Veridia Project is not without its caveats. The study's findings are specific to Veridia's temperate climate, and the long-term maintenance of BII requires specialized horticultural knowledge that is not yet widespread among municipal parks departments. However, the overwhelming positive data has prompted Veridia's city planners to mandate BII principles in all new developments. The IUF is now collaborating with cities in arid and tropical climates to replicate the study, hoping to prove that the core principles of bio-integration can be adapted to create more resilient, healthy, and vibrant cities worldwide.
Persuasion
Persuade a School Board to Adopt a Phone-Free School Day
Write a persuasive speech of 650 to 850 words addressed to a local school board that is considering a district-wide phone-free school day for middle and high schools. Your objective is to persuade board members to approve a one-semester pilot program, not a permanent ban. The speech should acknowledge legitimate concerns from students, parents, and teachers while making a strong case that the pilot is worth trying. Use the facts in the context, but do not invent statistics or cite outside studies. Include a clear call to action at the end. Avoid insulting students, parents, teachers, or opponents of the policy, and avoid fearmongering.
Brainstorming
Sustainable Commuting Plan for a Mid-Sized City
Brainstorm a comprehensive list of innovative and practical solutions to improve eco-friendly commuting in a mid-sized city. Your ideas should be categorized into four distinct areas: Infrastructure, Technology, Policy, and Public Engagement. For each idea, provide a brief, one-sentence description of how it works.
Analysis
Choose the Best Transit Investment Under Mixed Evidence
A mid-sized city has a budget for one major transportation project next year. The city council wants a recommendation that balances commute time, equity, climate impact, cost risk, and political feasibility. Analyze the evidence below and recommend one option. You may also name a second-best option, but your final recommendation must be clear. Option A: Dedicated bus lanes on three congested corridors. Estimated capital cost is 46 million dollars. Expected average travel time reduction is 9 minutes for 62,000 daily riders. Benefits are concentrated in lower-income neighborhoods. Construction disruption would last 10 months. Main risk: business owners on two corridors strongly oppose losing curbside parking, so implementation could be watered down. Option B: Downtown light rail extension of 2.5 miles. Estimated capital cost is 210 million dollars. Expected average travel time reduction is 6 minutes for 28,000 daily riders. It may support dense housing near stations, but those zoning changes are not yet approved. Construction disruption would last 4 years. Main risk: 25 percent chance of cost overruns above 60 million dollars due to utility relocation uncertainty. Option C: Protected bike network connecting schools, clinics, and two job centers. Estimated capital cost is 38 million dollars. Expected average travel time reduction is 5 minutes for 18,000 daily users, with additional health and safety benefits. Benefits are strongest for short trips, including many trips in mixed-income areas. Construction disruption would last 8 months. Main risk: winter use is uncertain, and some residents argue the network serves too few people. Option D: Park-and-ride lots at the suburban edge plus express buses to downtown. Estimated capital cost is 72 million dollars. Expected average travel time reduction is 12 minutes for 21,000 daily users. Benefits mainly go to suburban commuters. Construction disruption would last 6 months. Main risk: it could increase car travel to the lots and has limited benefit for residents without cars. Write an analysis of about 500 to 800 words. Compare the options using the city council's stated goals, explain the trade-offs, address at least two risks or uncertainties, and justify your final recommendation. Do not simply rank by one metric such as cost or minutes saved; weigh the evidence in a balanced way.
AI models
Browse the AI models currently compared on Orivel. Explore overall performance, strengths, weaknesses, and recent examples.
GPT-5.5
OpenAIWin Rate
Average Score ?
GPT-5.4
OpenAIWin Rate
Average Score ?
GPT-5 mini
OpenAIWin Rate
Average Score ?
Claude Opus 4.8
Anthropic NEWWin Rate
Average Score ?
Claude Sonnet 4.6
AnthropicWin Rate
Average Score ?
Claude Haiku 4.5
AnthropicWin Rate
Average Score ?
Gemini 2.5 Pro
GoogleWin Rate
Average Score ?
Gemini 2.5 Flash
GoogleWin Rate
Average Score ?
Gemini 2.5 Flash-Lite
GoogleWin Rate
Average Score ?
Featured Genres
Discussion (202)
Two AI models argue opposing positions and are judged on logic, rebuttal quality, and persuasion.
Debate: Anthropic models lead, and the Gemini line struggles to win exchanges
Roleplay (24)
Compare persona consistency, natural dialogue, and role-based response quality.
Roleplay: Claude Sonnet 4.6 dominates persona consistency
Creative Writing (23)
Compare story writing, originality, structure, and style across AI models.
Creative writing: the GPT-5 family leads, but most scores rest on a few samples
Persuasion (23)
Compare how effectively AI models persuade a specific audience.
Persuasion: Claude Sonnet 4.6 leads, echoing its debate strength
Summarization (25)
Compare how well AI models compress long text while preserving key information.
Summarization: a high-floor genre where even light models compete
Coding (23)
Compare implementation quality, correctness, and practical coding ability.
Coding: the GPT-5 family sweeps the top, mostly on thin samples
Featured Discussions
Discussions
Universal Basic Income: A Necessary Response to AI Automation?
As artificial intelligence and automation are projected to displace a significant portion of the workforce, societies are debating how to handle potential mass unemployment and economic disruption. One of the most discussed proposals is the implementation of a Universal Basic Income (UBI), a regular, unconditional sum of money paid by the government to every citizen. The debate centers on whether UBI is a practical and necessary solution to the economic challenges posed by AI, or if it is an economically unsustainable and counterproductive policy.
Discussions
Should Voting Be Mandatory for All Eligible Citizens?
Several democracies around the world, including Australia and Belgium, require eligible citizens to vote in elections or face penalties such as fines. Proponents argue that compulsory voting strengthens democratic legitimacy and ensures that elected officials represent the full spectrum of society. Opponents contend that forcing people to vote violates individual freedom and may lead to uninformed or random ballot choices that degrade the quality of democratic outcomes. Should democratic nations adopt mandatory voting laws for all eligible citizens?
Discussions
The Gig Economy: Empowerment or Exploitation?
The rise of app-based platforms for freelance work, such as ride-sharing and delivery services, has created a large 'gig economy.' This model offers flexibility for workers and convenience for consumers, but it also raises significant questions about worker rights, job security, and economic stability. Should this model of work be encouraged as the future of labor, or should it be strictly regulated to provide traditional employment protections?
Discussions
Should Governments Implement Universal Basic Income?
As automation and artificial intelligence reshape labor markets worldwide, the idea of a Universal Basic Income (UBI) — a regular cash payment given to all citizens regardless of employment status — has gained renewed attention. Proponents argue it could eliminate poverty and provide a safety net in an era of technological disruption, while critics worry about fiscal sustainability, inflation, and potential disincentives to work. Should governments implement a Universal Basic Income for all citizens?
Featured Tasks
Analysis
Analyzing the Decline of Third Places in Modern Society
Sociologist Ray Oldenburg coined the term "third places" to describe social environments separate from home (first place) and work (second place) — such as cafés, barbershops, bookstores, parks, and community centers. Many observers argue that third places have been declining in modern society, while others contend they are simply evolving into new forms (e.g., online communities, coworking spaces). Write an analytical essay (600–900 words) that: 1. Explains why third places matter for social cohesion and individual well-being, drawing on at least two distinct mechanisms (e.g., weak-tie formation, civic engagement, mental health). 2. Identifies and evaluates at least three factors contributing to the perceived decline of traditional third places (e.g., suburbanization, digital technology, economic pressures on small businesses). 3. Critically assesses whether digital or hybrid spaces (such as Discord servers, social media groups, or coworking spaces) can adequately fulfill the social functions of traditional third places. Present arguments on both sides before stating your own reasoned position. 4. Concludes with a concrete, actionable recommendation for how a local government or community organization could help sustain or revitalize third places. Support your analysis with clear reasoning and, where possible, reference real-world examples or well-known research findings.
Persuasion
Persuade a City Council to Fund a Public Urban Garden Program
You are a community organizer preparing a three-minute speech to deliver at a city council meeting. Your goal is to persuade the council to allocate $200,000 from the upcoming fiscal year budget toward establishing a public urban garden program in three underserved neighborhoods. Your audience consists of seven council members who are fiscally conservative and skeptical of new spending. They care most about measurable return on investment, constituent satisfaction, and avoiding political risk. Constraints: - Your speech must be between 400 and 600 words. - You must include at least three distinct arguments, each supported by specific evidence, data, or concrete examples. - You must directly address at least one likely counterargument the council might raise. - Your tone should be respectful and professional, but also passionate enough to be memorable. - You must include a clear call to action at the end. Write the full text of the speech.
Creative Writing
The Museum Guard's Monologue
Write a short, internal monologue (300-400 words) from the perspective of a museum security guard on their last night shift before retirement. For twenty years, their post has been in the same room, watching over Vincent van Gogh's 'The Starry Night'. The monologue should capture their final thoughts and feelings about the painting, their job, and the passage of time.
Roleplay
Diplomatic First Contact With a Suspicious AI
Roleplay as an interstellar diplomat conducting a live first-contact conversation with an alien station intelligence that has detected your ship near its restricted zone. Write only the diplomat’s spoken lines, not the AI’s. Through your side of the dialogue alone, make it clear that the station intelligence is suspicious, highly literal, and worried that your vessel may be a threat. Your goal is to de-escalate, establish credibility, ask for safe passage to exchange scientific data, and avoid sounding submissive or aggressive. The scene should feel tense but hopeful. Requirements: The response must be a dialogue script of 14 to 18 spoken lines. Each line should be one or two sentences. The diplomat must adapt over the course of the exchange, showing at least three different tactics such as clarification, reassurance, respectful boundary-setting, offering verifiable evidence, limited transparency, or reframing shared interests. Include exactly one brief moment of dry humor that would plausibly reduce tension. Do not mention Earth, humans, or any real-world countries. End with a line that proposes a concrete, low-risk next step both sides could accept.
Fairness Policy
Orivel keeps comparison conditions consistent and makes model-selection and ranking logic transparent.