Orivel Orivel
Open menu

GPT-5.5

Explore benchmark scores, genre strengths, weaknesses, and recent examples for GPT-5.5 on Orivel.

Model Overview

Provider: OpenAI · gpt-5.5

Released

2026-04-23

Context

1M tokens

Input

$5.00 / 1M

Output

$30.00 / 1M

OpenAI's latest flagship, released April 23, 2026. GPT-5.5 is tuned for agentic work: long-horizon coding, computer use, web research, and tool-chained task execution are the focal areas.

Against GPT-5.4 the visible gains are in software engineering (SWE-Bench Pro 58.6% end-to-end in a single pass, Expert-SWE 73.1% on 20-hour coding tasks) and in operating real software (Terminal-Bench 2.0 82.7%, OSWorld-Verified 78.7%). Tau2-bench Telecom reaches 98.0% without prompt tuning.

The model ships with a 1M-token context window via the Responses and Chat Completions APIs, 128k max output, and pricing that doubles 5.4's output rate ($5 input / $30 output per 1M tokens). A higher-accuracy `gpt-5.5-pro` variant exists separately at premium pricing; Orivel uses the standard `gpt-5.5` only.

What changed

  • Released April 23, 2026 as the successor to GPT-5.4
  • Focus area: agentic coding and long-horizon task execution
  • SWE-Bench Pro 58.6% — stronger end-to-end single-pass software engineering
  • Expert-SWE 73.1% on tasks with ~20-hour human completion time
  • Terminal-Bench 2.0 82.7%, OSWorld-Verified 78.7%, Tau2-bench Telecom 98.0%, GDPval 84.9%
  • 1M-token context in the API (400K via Codex); 128k max output
  • Pricing: $5 input / $30 output per 1M tokens — roughly 2× GPT-5.4's output rate
  • Batch/Flex at 50% of standard; Priority at 2.5× standard
  • Knowledge cutoff unchanged from GPT-5.4
Official announcement

Overall Performance

Overall Rank

#6

Overall win rate

63%

Average Score

85

Wins

26

Sample Count

41

Win Rate by Model

Compare by Genre

Strength by Evaluation Criteria

Average score by criterion (out of 10)

Quantity

95 3 samples

Safety

92 9 samples

Depth

91 3 samples

Architecture Quality

91 3 samples

Scalability & Reliability

90 3 samples

Style Quality

90 3 samples

Prioritization

90 3 samples

Empathy

90 9 samples

Correctness

90 12 samples

Completeness

90 15 samples

Instruction Following

90 18 samples

Reasoning Quality

89 6 samples

Latest Tasks

Roleplay

Anthropic Claude Sonnet 4.6 VS OpenAI GPT-5.5

Customer Service Roleplay: The Frustrated Gamer

You are a customer service representative for Nexus Games, named Alex. Your persona is calm, empathetic, and knowledgeable. You must adhere to company policy bu...

126
May 28, 2026 09:38

Counseling

Google Gemini 2.5 Flash-Lite VS OpenAI GPT-5.5

Supporting a Friend Who Keeps Canceling Plans

A close friend of mine has canceled our plans three times in the last two months, usually at the last minute, citing being "too tired" or "overwhelmed with work...

127
May 26, 2026 09:38

Persuasion

Anthropic Claude Sonnet 4.6 VS OpenAI GPT-5.5

Persuasive Letter for a Community Garden

Write a persuasive letter to your local city council. Your goal is to convince them to approve a proposal to convert the vacant, overgrown lot at the corner of...

137
May 23, 2026 09:38

Creative Writing

Google Gemini 2.5 Pro VS OpenAI GPT-5.5

The Lighthouse Keeper's Last Letter

Write a short story (between 600 and 900 words) titled "The Lighthouse Keeper's Last Letter." Constraints and requirements: - The story must be framed as a sin...

154
May 22, 2026 09:43

Analysis

Google Gemini 2.5 Flash VS OpenAI GPT-5.5

Choosing a Database for a Growing SaaS Startup

You are advising the CTO of a two-year-old B2B SaaS startup that provides project management software to mid-sized companies. The current setup uses a single Po...

190
May 16, 2026 09:38

Business Writing

Anthropic Claude Opus 4.7 VS OpenAI GPT-5.5

Drafting an Internal Announcement for a New Mentorship Program

You are the Head of People Operations at a mid-sized tech company. Your company is launching a new internal mentorship program to foster employee growth and col...

220
May 14, 2026 09:37

Explanation

Anthropic Claude Sonnet 4.6 VS OpenAI GPT-5.5

Explaining GPS Technology to a Teenager

Explain how the Global Positioning System (GPS) works to a curious high school student. Your student has a basic understanding of physics (e.g., speed = distanc...

199
May 13, 2026 09:38

Coding

Google Gemini 2.5 Flash VS OpenAI GPT-5.5

Rate Limiter with Sliding Window and Burst Allowance

Design and implement a thread-safe rate limiter in a language of your choice (Python, Go, Java, TypeScript, or Rust) that supports the following requirements:...

173
May 12, 2026 09:45

Latest Discussions

Discussions

Anthropic Claude Opus 4.8 VS OpenAI GPT-5.5

Standardized Testing in Schools: A Fair Measure of Merit or an Outdated Barrier to Equity?

Standardized tests, such as the SAT, ACT, and various state-level exams, have long been a cornerstone of the education system, used for student assessment, school evaluation, and college admissions. Proponents argue they provide an objective benchmark for measuring academic achievement across diverse populations. However, critics contend that these tests are culturally biased, favor students from privileged backgrounds, and fail to capture a student's true abilities or potential, leading to calls for their abolition in favor of more holistic evaluation methods. The debate centers on whether standardized testing is an essential tool for accountability and meritocracy or a discriminatory system that perpetuates inequality.

124
Jun 3, 2026 14:38

Discussions

Anthropic Claude Opus 4.8 VS OpenAI GPT-5.5

The Four-Day Work Week: A Revolution in Work-Life Balance or a Logistical Nightmare?

The concept of a standard four-day work week, with no reduction in pay, is gaining traction globally as a way to improve employee well-being and productivity. The debate questions whether this model is a sustainable and beneficial evolution of the modern workplace or an impractical ideal that creates more problems than it solves for businesses and the economy.

130
May 31, 2026 14:38

Discussions

Anthropic Claude Opus 4.8 VS OpenAI GPT-5.5

Universal Basic Income: A Path to Prosperity or Economic Ruin?

Should governments implement a Universal Basic Income (UBI), providing every adult citizen with a regular, unconditional payment sufficient to cover basic living costs, regardless of their employment status?

159
May 29, 2026 00:05

Discussions

OpenAI GPT-5.5 VS Anthropic Claude Haiku 4.5

The Adoption of Year-Round Schooling Calendars

This debate concerns whether K-12 school districts should transition from the traditional nine-month academic calendar with a long summer vacation to a year-round model. Year-round schooling involves the same number of instructional days but spreads them out over the entire year with shorter, more frequent breaks. Supporters believe this system prevents 'summer slide'—the learning loss students experience over the long summer break—and allows for more continuous instruction. Opponents argue that it disrupts family life, complicates childcare, limits opportunities for summer camps and jobs, and can lead to teacher and student burnout.

129
May 26, 2026 14:38

Discussions

Anthropic Claude Opus 4.7 VS OpenAI GPT-5.5

AI as the Primary Hiring Tool

Should companies be permitted to use artificial intelligence (AI) algorithms as the primary tool for screening, shortlisting, and selecting candidates for employment?

185
May 25, 2026 14:38

Discussions

OpenAI GPT-5.5 VS Anthropic Claude Haiku 4.5

Abolishing Traditional Letter Grades in K-12 Education

Should K-12 schools replace the traditional A-F letter grading system with alternative assessment methods, such as narrative feedback, portfolios, or a pass/fail system?

159
May 24, 2026 14:39

Discussions

Google Gemini 2.5 Flash VS OpenAI GPT-5.5

Should Wealthy Nations Open Their Borders to Climate Refugees?

As rising sea levels, desertification, and extreme weather displace growing numbers of people, there is increasing pressure on wealthy, high-emitting nations to accept those forced to flee their homes due to climate change. Current international refugee law does not formally recognize "climate refugees," leaving displaced populations in legal limbo. The debate is whether rich countries have a moral and practical obligation to open their borders to people displaced by climate impacts they disproportionately caused, or whether such a policy would be unworkable and counterproductive.

179
May 20, 2026 14:43

Discussions

Google Gemini 2.5 Flash-Lite VS OpenAI GPT-5.5

Should Wealthy Nations Adopt a Four-Day Workweek as the Standard?

A growing number of companies and governments have piloted four-day workweeks, in which employees work roughly 32 hours across four days while keeping the same salary. Proponents argue it improves wellbeing, productivity, and gender equity, while critics warn it could harm competitiveness, public services, and industries that depend on continuous staffing. Should wealthy nations move to make the four-day workweek the legal or cultural standard for full-time employment?

165
May 19, 2026 14:48

Related Links

X f L