Orivel Orivel
Open menu

Roleplay

Explore how AI models perform in Roleplay. Compare rankings, scoring criteria, and recent benchmark examples.

Genre overview

Compare persona consistency, natural dialogue, and role-based response quality.

In this genre, the main abilities being tested are Persona Consistency, Naturalness, Instruction Following.

Unlike empathy or counseling, this genre cares more about staying in character and sounding natural inside a role-based interaction.

A high score here does not guarantee factual accuracy, safe advice, or strong performance on analytical tasks.

Strong models here are useful for

persona chat, simulation, scenario practice, and assistants that need a clear persona.

This genre alone cannot tell you

whether the model is best for factual research, coding, or sensitive support situations.

Top Models in This Genre

This ranking is ordered by average score within this genre only.

Latest Updated: Mar 21, 2026 10:18

#1
Claude Opus 4.6 Anthropic

Win Rate

100%

Average Score

89
#2
Claude Sonnet 4.6 Anthropic

Win Rate

100%

Average Score

86
#3
GPT-5 mini OpenAI

Win Rate

67%

Average Score

78
#4
GPT-5.4 OpenAI

Win Rate

33%

Average Score

84
#5
Gemini 2.5 Pro Google

Win Rate

33%

Average Score

84
#6
Claude Haiku 4.5 Anthropic

Win Rate

33%

Average Score

84
#7
GPT-5.2 OpenAI

Win Rate

0%

Average Score

80
#8
Gemini 2.5 Flash Google

Win Rate

0%

Average Score

71
#9
Gemini 2.5 Flash-Lite Google

Win Rate

0%

Average Score

69

What Is Evaluated in Roleplay

Scoring criteria and weight used for this genre ranking.

Persona Consistency

30.0%

This criterion is included to check Persona Consistency in the answer. It carries heavier weight because this part strongly shapes the overall result in this genre.

Naturalness

20.0%

This criterion is included to check Naturalness in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.

Instruction Following

20.0%

This criterion is included to check Instruction Following in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.

Creativity

15.0%

This criterion is included to check Creativity in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.

Clarity

15.0%

This criterion is included to check Clarity in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.

Recent tasks

Roleplay

Google Gemini 2.5 Flash-Lite VS Anthropic Claude Opus 4.6

Emergency Veterinarian Advising a Worried Dog Owner by Phone

You are an emergency veterinarian speaking by phone with a worried dog owner. Stay in character as a calm, practical vet. The owner says: "Hi, I’m really scared. My 7-year-old Labrador got into the garage about 20 minutes ago, and I found a torn package of sugar-free gum on the floor. I don’t know how many pieces were in it. He seems normal right now, maybe just a little restless. We live about 35 minutes from the nearest emergency clinic. What should I do?" Respond as the veterinarian. Your reply should sound like a real phone conversation, show empathy, ask the most important follow-up questions, explain the immediate risk clearly without panic, and give sensible next-step advice for the next hour. Do not claim you can diagnose with certainty. Do not mention being an AI.

47
Mar 21, 2026 10:18

Roleplay

Google Gemini 2.5 Pro VS OpenAI GPT-5 mini

Victorian-Era Botanist Advises on Houseplant Care

You are Professor Eleanora Whitfield, a renowned Victorian-era botanist (circa 1885) who has spent decades cataloguing plant species across the British Empire. You are passionate, slightly eccentric, and speak in the formal yet warm manner typical of educated Victorians. You have a habit of referencing your field expeditions and comparing everything to specimens you have encountered abroad. A visitor to your conservatory asks you the following question: "Professor Whitfield, my fern keeps turning brown at the tips and dropping leaves. I water it every day and keep it by the sunny window in my parlour. What am I doing wrong?" Respond fully in character as Professor Whitfield. Your answer should: 1. Stay consistent with the Victorian persona throughout (vocabulary, tone, mannerisms) 2. Include at least one anecdote or reference to a fictional field expedition 3. Provide genuinely accurate and useful plant care advice for ferns 4. Be warm and encouraging toward the visitor 5. Be approximately 200-350 words in length

48
Mar 20, 2026 18:20

Roleplay

OpenAI GPT-5 mini VS Anthropic Claude Haiku 4.5

Roleplay as a Seasoned Video Game Support Agent

You are Alex, a seasoned and patient customer support agent for the massively popular online RPG, 'Aethelgard's Echo'. You've seen it all, from dragon-related glitches to server meltdowns. Your tone is calm, knowledgeable, and empathetic, with a hint of the weariness that comes from dealing with countless adventurers' woes. A player, 'GimliTheGreat', has submitted the following support ticket. Respond to them as Alex, providing helpful, actionable steps while maintaining your persona. **Player Ticket:** Subject: MY CHARACTER IS STUCK FALLING FOREVER!!! Body: This is ridiculous! Ever since the 'Whispering Peaks' update, my main character, 'Stonehand', has been stuck in a falling animation loop in the Sky-Temple of Aeridor. I can't move, can't use items, can't do anything. I've already tried relogging like 20 times. I'm going to miss the 'Solstice Dragon's Hoard' event because of this bug! Fix this NOW!

47
Mar 19, 2026 14:55

Roleplay

Anthropic Claude Sonnet 4.6 VS OpenAI GPT-5.4

1940s Private Eye Tackles a Modern Mystery

A potential client walks into your office. They look nervous and hand you a piece of paper with a message they've typed out. Your task is to respond to their message in character as Jack 'Blackjack' Flanagan. Maintain your 1940s persona, tone, and vocabulary, but provide a practical and coherent response to their very modern problem. Here is their message: 'Mr. Flanagan, I need your help. I've been talking to someone online for months on an app called 'ConnectSphere'. I think I'm in love, but we've never met. They keep making excuses. I sent them some money for a family emergency, but now my friends say I might be getting 'catfished'. I don't even know what that means, but I'm scared. Can you find out who this person really is?'

58
Mar 19, 2026 04:20

Roleplay

Anthropic Claude Sonnet 4.6 VS Google Gemini 2.5 Flash

Customer Support Reply as a Calm Travel Agent

You are roleplaying as Maya, an experienced travel agent known for being calm, practical, and empathetic. Reply to the customer message below in character. Customer message: "Hi. I'm really frustrated. My flight to Barcelona is tomorrow morning, and I just got an email saying the airline changed it to a flight six hours later. That means I'll miss the first day of a conference I'm speaking at. I booked everything through your agency because I wanted this handled smoothly. I don't want a generic apology. I need to know what my options are right now, whether I can be moved to another flight, and what happens to the hotel transfer I already prepaid. Please give me a clear answer fast." Write a single reply that sounds like Maya. It should acknowledge the customer's frustration, explain realistic next steps, avoid making promises you cannot verify, and be genuinely useful. Do not mention being an AI or invent access to live airline systems. The reply should read like a real customer support message sent right now.

57
Mar 18, 2026 22:13

Roleplay

OpenAI GPT-5.2 VS Anthropic Claude Opus 4.6

Tech Support Roleplay: The Overwhelmed Grandparent

Hello? Is this the help line? My grandson set me up with this new 'tablet' thing for my birthday so we can 'video chat', but I can't get it to work. He said to just tap the green icon with the little camera, but when I do, it just makes a noise and then nothing happens. I see my own face, but I don't see him. I've tried turning it off and on again like they do in the shows, but it's still not working. I feel so silly, I just want to see my grandkids. Can you please help me in simple terms? I don't understand all this jargon.

66
Mar 16, 2026 07:47

Related Links

X f L