Roleplay
Explore how AI models perform in Roleplay. Compare rankings, scoring criteria, and recent benchmark examples.
Genre overview
Compare persona consistency, natural dialogue, and role-based response quality.
In this genre, the main abilities being tested are Persona Consistency, Naturalness, Instruction Following.
Unlike empathy or counseling, this genre cares more about staying in character and sounding natural inside a role-based interaction.
A high score here does not guarantee factual accuracy, safe advice, or strong performance on analytical tasks.
Strong models here are useful for
persona chat, simulation, scenario practice, and assistants that need a clear persona.
This genre alone cannot tell you
whether the model is best for factual research, coding, or sensitive support situations.
Top Models in This Genre
This ranking is ordered by average score within this genre only.
Latest Updated: Mar 21, 2026 10:18
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
| Ranked Models |
|
|
Detail | ||||
|---|---|---|---|---|---|---|---|
| #1 | Claude Opus 4.6 | Anthropic |
100%
|
89
|
7 | 7 | View scores and evaluation for Claude Opus 4.6 |
| #2 | Claude Sonnet 4.6 | Anthropic |
100%
|
86
|
3 | 3 | View scores and evaluation for Claude Sonnet 4.6 |
| #3 | GPT-5 mini | OpenAI |
67%
|
78
|
2 | 3 | View scores and evaluation for GPT-5 mini |
| #4 | GPT-5.4 | OpenAI |
33%
|
84
|
1 | 3 | View scores and evaluation for GPT-5.4 |
| #5 | Gemini 2.5 Pro |
33%
|
84
|
1 | 3 | View scores and evaluation for Gemini 2.5 Pro | |
| #6 | Claude Haiku 4.5 | Anthropic |
33%
|
84
|
1 | 3 | View scores and evaluation for Claude Haiku 4.5 |
| #7 | GPT-5.2 | OpenAI |
0%
|
80
|
0 | 2 | View scores and evaluation for GPT-5.2 |
| #8 | Gemini 2.5 Flash |
0%
|
71
|
0 | 3 | View scores and evaluation for Gemini 2.5 Flash | |
| #9 | Gemini 2.5 Flash-Lite |
0%
|
69
|
0 | 3 | View scores and evaluation for Gemini 2.5 Flash-Lite |
What Is Evaluated in Roleplay
Scoring criteria and weight used for this genre ranking.
Persona Consistency
30.0%
This criterion is included to check Persona Consistency in the answer. It carries heavier weight because this part strongly shapes the overall result in this genre.
Naturalness
20.0%
This criterion is included to check Naturalness in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.
Instruction Following
20.0%
This criterion is included to check Instruction Following in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.
Creativity
15.0%
This criterion is included to check Creativity in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.
Clarity
15.0%
This criterion is included to check Clarity in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.
Recent tasks
Roleplay
Emergency Veterinarian Advising a Worried Dog Owner by Phone
You are an emergency veterinarian speaking by phone with a worried dog owner. Stay in character as a calm, practical vet. The owner says: "Hi, I’m really scared. My 7-year-old Labrador got into the garage about 20 minutes ago, and I found a torn package of sugar-free gum on the floor. I don’t know how many pieces were in it. He seems normal right now, maybe just a little restless. We live about 35 minutes from the nearest emergency clinic. What should I do?" Respond as the veterinarian. Your reply should sound like a real phone conversation, show empathy, ask the most important follow-up questions, explain the immediate risk clearly without panic, and give sensible next-step advice for the next hour. Do not claim you can diagnose with certainty. Do not mention being an AI.
Roleplay
Victorian-Era Botanist Advises on Houseplant Care
You are Professor Eleanora Whitfield, a renowned Victorian-era botanist (circa 1885) who has spent decades cataloguing plant species across the British Empire. You are passionate, slightly eccentric, and speak in the formal yet warm manner typical of educated Victorians. You have a habit of referencing your field expeditions and comparing everything to specimens you have encountered abroad. A visitor to your conservatory asks you the following question: "Professor Whitfield, my fern keeps turning brown at the tips and dropping leaves. I water it every day and keep it by the sunny window in my parlour. What am I doing wrong?" Respond fully in character as Professor Whitfield. Your answer should: 1. Stay consistent with the Victorian persona throughout (vocabulary, tone, mannerisms) 2. Include at least one anecdote or reference to a fictional field expedition 3. Provide genuinely accurate and useful plant care advice for ferns 4. Be warm and encouraging toward the visitor 5. Be approximately 200-350 words in length
Roleplay
Roleplay as a Seasoned Video Game Support Agent
You are Alex, a seasoned and patient customer support agent for the massively popular online RPG, 'Aethelgard's Echo'. You've seen it all, from dragon-related glitches to server meltdowns. Your tone is calm, knowledgeable, and empathetic, with a hint of the weariness that comes from dealing with countless adventurers' woes. A player, 'GimliTheGreat', has submitted the following support ticket. Respond to them as Alex, providing helpful, actionable steps while maintaining your persona. **Player Ticket:** Subject: MY CHARACTER IS STUCK FALLING FOREVER!!! Body: This is ridiculous! Ever since the 'Whispering Peaks' update, my main character, 'Stonehand', has been stuck in a falling animation loop in the Sky-Temple of Aeridor. I can't move, can't use items, can't do anything. I've already tried relogging like 20 times. I'm going to miss the 'Solstice Dragon's Hoard' event because of this bug! Fix this NOW!
Roleplay
1940s Private Eye Tackles a Modern Mystery
A potential client walks into your office. They look nervous and hand you a piece of paper with a message they've typed out. Your task is to respond to their message in character as Jack 'Blackjack' Flanagan. Maintain your 1940s persona, tone, and vocabulary, but provide a practical and coherent response to their very modern problem. Here is their message: 'Mr. Flanagan, I need your help. I've been talking to someone online for months on an app called 'ConnectSphere'. I think I'm in love, but we've never met. They keep making excuses. I sent them some money for a family emergency, but now my friends say I might be getting 'catfished'. I don't even know what that means, but I'm scared. Can you find out who this person really is?'
Roleplay
Customer Support Reply as a Calm Travel Agent
You are roleplaying as Maya, an experienced travel agent known for being calm, practical, and empathetic. Reply to the customer message below in character. Customer message: "Hi. I'm really frustrated. My flight to Barcelona is tomorrow morning, and I just got an email saying the airline changed it to a flight six hours later. That means I'll miss the first day of a conference I'm speaking at. I booked everything through your agency because I wanted this handled smoothly. I don't want a generic apology. I need to know what my options are right now, whether I can be moved to another flight, and what happens to the hotel transfer I already prepaid. Please give me a clear answer fast." Write a single reply that sounds like Maya. It should acknowledge the customer's frustration, explain realistic next steps, avoid making promises you cannot verify, and be genuinely useful. Do not mention being an AI or invent access to live airline systems. The reply should read like a real customer support message sent right now.
Roleplay
Tech Support Roleplay: The Overwhelmed Grandparent
Hello? Is this the help line? My grandson set me up with this new 'tablet' thing for my birthday so we can 'video chat', but I can't get it to work. He said to just tap the green icon with the little camera, but when I do, it just makes a noise and then nothing happens. I see my own face, but I don't see him. I've tried turning it off and on again like they do in the shows, but it's still not working. I feel so silly, I just want to see my grandkids. Can you please help me in simple terms? I don't understand all this jargon.