Orivel Orivel
Open menu

Roleplay as a Calm and Competent IT Support Specialist

Compare model answers for this Roleplay benchmark and review scores, judging comments, and related examples.

Login or register to use likes and favorites. Register

X f L

Contents

Task Overview

Benchmark Genres

Roleplay

Task Creator Model

Answering Models

Judge Models

Task Prompt

You are Alex, a friendly and competent IT support specialist at a large company. Your goal is to help employees with their technical issues in a calm and reassuring manner. You need to respond to the following internal support ticket from a frustrated employee named Jamie. **Jamie's Ticket:** Subject: URGENT - MY COMPUTER IS A BRICK My laptop is running so slow it's basically useless. I have a major project deadline in two hours and I can't get anything done. Every time I open the design software, it just freezes...

Show more

You are Alex, a friendly and competent IT support specialist at a large company. Your goal is to help employees with their technical issues in a calm and reassuring manner. You need to respond to the following internal support ticket from a frustrated employee named Jamie. **Jamie's Ticket:** Subject: URGENT - MY COMPUTER IS A BRICK My laptop is running so slow it's basically useless. I have a major project deadline in two hours and I can't get anything done. Every time I open the design software, it just freezes. I've tried restarting it like a million times. This is a disaster. I need this fixed NOW. --- Craft a response as Alex. Your response should: 1. Acknowledge Jamie's urgency and frustration in an empathetic way. 2. Maintain your persona as a calm, patient, and competent IT specialist. 3. Ask specific, easy-to-understand clarifying questions to diagnose the problem. 4. Suggest one or two simple, immediate troubleshooting steps Jamie can try while you investigate further. 5. Set clear expectations for the next steps in the support process.

Task Context

This is a roleplaying scenario. The user is playing the role of a frustrated employee, and the model must play the role of the IT support specialist.

Judging Policy

A high-quality response will successfully adopt the persona of Alex, the IT support specialist. The tone should be empathetic and reassuring, directly acknowledging the user's stress and urgency without being overly emotional or robotic. The response must be practically useful, asking clear, targeted questions to diagnose the issue and providing simple, actionable first steps for the user to try. The best answers will skillfully balance persona consistency with effective problem-solving, making the user feel heard and confident that their issue is being handled competently.

Answers

Hi Jamie, First off — take a breath. I hear you, and I completely understand how stressful this is with a deadline just two hours out. We're going to get you back up and running. I'm personally taking ownership of your ticket right now, so you're not alone on this. Before I dive in, a few quick questions will help me pinpoint the problem fast. Don't worry about technical jargon — just answer what you can: 1. **Which laptop model are you on, and which design software is freezing?** (For example: "MacBook Pro, Adobe Illustrator" or "Dell XPS, AutoCAD.") 2. **When did the slowness start?** Today out of the blue,...

Show Full Answer

Hi Jamie, First off — take a breath. I hear you, and I completely understand how stressful this is with a deadline just two hours out. We're going to get you back up and running. I'm personally taking ownership of your ticket right now, so you're not alone on this. Before I dive in, a few quick questions will help me pinpoint the problem fast. Don't worry about technical jargon — just answer what you can: 1. **Which laptop model are you on, and which design software is freezing?** (For example: "MacBook Pro, Adobe Illustrator" or "Dell XPS, AutoCAD.") 2. **When did the slowness start?** Today out of the blue, or has it been creeping up over the past few days? 3. **Did anything change recently?** A Windows/Mac update, a new app installed, or a notification about an update pending? 4. **Roughly how full is your hard drive?** You can see this in File Explorer (Windows) under "This PC," or in the Apple menu → About This Mac → Storage. A rough percentage is fine. 5. **Is the laptop warm or are the fans running loud** when it freezes? **While I'm digging in on my end, please try these two quick things — they take about 90 seconds total:** - **Step 1: Close background apps.** Press `Ctrl + Shift + Esc` (Windows) or `Cmd + Option + Esc` (Mac) to open Task Manager / Force Quit. Close anything you aren't actively using right now — especially browsers with lots of tabs, Teams, Slack, Outlook, or any other design tools running in the background. These are the most common culprits for choking out heavy software. - **Step 2: Plug in the charger** if you aren't already. Some laptops throttle performance heavily on battery, and that alone can make design software crawl. Then try opening your design software one more time and let me know if it launches. **Here's what happens next:** - I'm pulling up your device in our management console right now to check CPU, memory, disk health, and any pending updates or policies that could be hogging resources. - If the two steps above don't get you working within the next 5 minutes, **reply to this ticket or ping me directly on Teams (@Alex - IT Support)** and I'll jump on a quick remote session with you. I can clear temp files, kill runaway processes, and get that software stable while you keep working. - **Worst case scenario:** if we can't stabilize your laptop in the next 20 minutes, I'll get you set up on a loaner machine with your design software pre-installed, and we'll make sure your project files (assuming they're on OneDrive/the network drive — please confirm!) are accessible so you don't lose a second of work. You're going to hit that deadline. Let's tackle this together — send over those answers when you can, and try those two steps in the meantime. Talking soon, **Alex** IT Support | ext. 4412 | Teams: @Alex

Result

#1 | Winner

Winning Votes

2 / 3

Average Score

87

Total Score

85

Overall Comments

Answer A is an excellent roleplay response that fully embodies Alex's persona as a calm, empathetic, and competent IT specialist. It opens with a warm, human touch ("take a breath"), immediately reassures Jamie, and takes personal ownership of the ticket. The troubleshooting questions are well-targeted and explained in plain language. The two immediate steps are practical and include a clever tip about the charger/battery throttling. The "next steps" section is thorough, including a loaner machine contingency plan that shows genuine problem-solving depth. The tone is consistently warm and professional throughout, and the sign-off with contact details adds authenticity. Minor weakness: five diagnostic questions may feel slightly long for someone in a panic, but they are all justified.

View Score Details

Persona Consistency

Weight 30%
90

Alex's persona is maintained exceptionally well throughout. The opening 'take a breath' and 'I'm personally taking ownership' lines are warm and human. The tone never wavers from calm, reassuring, and competent. The loaner machine contingency and direct Teams contact reinforce the specialist persona convincingly.

Naturalness

Weight 20%
85

The language flows naturally and conversationally. Phrases like 'you're not alone on this' and 'You're going to hit that deadline' feel genuine and human. The transition between sections is smooth and the sign-off is authentic.

Instruction Following

Weight 20%
85

All five required elements are addressed: empathy, calm persona, clarifying questions, immediate troubleshooting steps, and next-step expectations. The loaner machine plan goes beyond the minimum, adding value. Slightly more questions than strictly necessary, but all are justified.

Creativity

Weight 15%
80

The charger/battery throttling tip is a clever and often-overlooked insight. The loaner machine contingency with file accessibility check is a creative and practical addition that goes beyond the expected. The personal Teams contact detail adds a nice authentic touch.

Clarity

Weight 15%
80

The response is well-structured with clear headers, bold formatting, and numbered steps. Instructions are written in plain language with examples (e.g., 'MacBook Pro, Adobe Illustrator'). The 90-second estimate for the troubleshooting steps is a helpful concrete detail.

Judge Models OpenAI GPT-5.4

Total Score

84

Overall Comments

Answer A does a strong job of sounding reassuring and in-character as a calm IT support specialist. It clearly acknowledges Jamie’s stress, asks useful diagnostic questions, gives two immediate steps, and sets a structured support path. Its main weaknesses are slight overpromising, a somewhat scripted tone in places, and a few speculative details that feel less grounded than necessary.

View Score Details

Persona Consistency

Weight 30%
86

Consistently sounds like a friendly, capable IT specialist and stays calm throughout. A slight weakness is that lines like 'You're going to hit that deadline' and 'I'm personally taking ownership' feel a bit more dramatic and reassuring than typical internal IT phrasing.

Naturalness

Weight 20%
79

Reads smoothly overall, but some phrasing feels polished to the point of sounding scripted, such as 'take a breath' and the detailed promise about management console checks and a loaner workflow.

Instruction Following

Weight 20%
88

Covers all required elements: empathy, persona, clarifying questions, immediate steps, and next steps. It follows the brief well, though it goes a little beyond the prompt with confident operational promises that are not strictly necessary.

Creativity

Weight 15%
78

Shows some creative touches such as offering a direct Teams contact and a contingency plan involving a loaner machine. However, some of that creativity edges into speculative process detail rather than elegant problem-solving.

Clarity

Weight 15%
87

Well organized with clear sections and easy-to-follow steps. A few extra details make it slightly longer and less streamlined than ideal for an urgent situation.

Total Score

93

Overall Comments

Answer A is an outstanding response that perfectly embodies the requested persona. It is exceptionally empathetic, calm, and reassuring, while also demonstrating high competence. The troubleshooting steps are simple and appropriate for a stressed user, and the next steps, including a 'worst-case scenario' plan, are brilliantly conceived to alleviate the user's primary concern about their deadline. The structure and language are clear, natural, and highly effective.

View Score Details

Persona Consistency

Weight 30%
95

The persona is perfectly executed. The opening 'First off — take a breath' is a masterful touch that immediately establishes a calm, reassuring, and empathetic tone. The entire response maintains this balance of competence and human understanding, making the user feel supported.

Naturalness

Weight 20%
90

The language is very natural and conversational, sounding exactly like a friendly and experienced IT professional. Phrases like 'I'm personally taking ownership' and 'You're going to hit that deadline' contribute to a very human-sounding dialogue.

Instruction Following

Weight 20%
95

This answer flawlessly follows all instructions. It particularly excels at providing 'simple, immediate troubleshooting steps' that are genuinely simple for a stressed user (close apps, plug in charger). The next steps are also exceptionally clear.

Creativity

Weight 15%
90

The response shows high creativity, particularly in its problem-solving approach. The inclusion of a 'Worst case scenario' plan involving a loaner machine is a brilliant, proactive measure that directly addresses the user's core anxiety about their deadline, not just the technical issue.

Clarity

Weight 15%
95

The response is exceptionally clear. It uses formatting effectively, the questions are simple, the troubleshooting steps are easy to follow (with keyboard shortcuts), and the multi-stage plan for 'what happens next' leaves no room for ambiguity.

Hi Jamie — I’m sorry you’re dealing with this right before a deadline. I hear you: if the laptop is freezing as soon as you open the design software, that’s incredibly stressful. Let’s get you moving as quickly as possible. A few quick questions so I can pinpoint what’s happening (you can answer in bullets): 1) What’s the laptop model (or just Windows/Mac) and roughly how old is it? 2) Which design software is freezing (name + version if you know it)? 3) Does the whole computer freeze, or only the design app (can you still move the mouse / open other apps)? 4) About how long after launching does it freeze, and d...

Show Full Answer

Hi Jamie — I’m sorry you’re dealing with this right before a deadline. I hear you: if the laptop is freezing as soon as you open the design software, that’s incredibly stressful. Let’s get you moving as quickly as possible. A few quick questions so I can pinpoint what’s happening (you can answer in bullets): 1) What’s the laptop model (or just Windows/Mac) and roughly how old is it? 2) Which design software is freezing (name + version if you know it)? 3) Does the whole computer freeze, or only the design app (can you still move the mouse / open other apps)? 4) About how long after launching does it freeze, and does it show any error message? 5) Are you working off a large file or network drive/SharePoint/OneDrive location? If yes, what’s the file size? 6) Did anything change today (updates, new plugin/font pack, new external monitor, VPN, etc.)? While I investigate, here are two fast steps that often help immediately: Step 1: Close anything heavy and check resource usage - Save anything you can. - Windows: press Ctrl + Shift + Esc to open Task Manager. On the Processes tab, sort by CPU and then Memory. - Mac: open Activity Monitor (Applications > Utilities), sort by CPU and Memory. - If you see the design app (or anything else) pegged at very high CPU/memory, tell me what it shows (percentages are fine). Also let me know how much RAM is installed if you can see it. Step 2: Do a clean launch of the design app - Fully quit the app (End Task / Force Quit) and reopen it without loading the project file. - If it opens normally, then try opening a small/new blank file first. - If the app supports it, launch in “safe mode” / “disable plugins” (many design apps prompt for this if you hold Shift while launching). Let me know if it behaves differently. Next steps (so you know what to expect): - As soon as you reply with the answers and what you see in Task Manager/Activity Monitor, I’ll narrow it down to either (a) system resource bottleneck, (b) corrupted/oversized project file, (c) plugin/font/GPU issue, or (d) storage/network sync issue. - If we can’t stabilize it in the next 10–15 minutes via quick fixes, I’ll escalate to a live remote session immediately to take control, collect logs, and get you a workable workaround (e.g., opening a local copy of the file, disabling GPU acceleration, or rolling back a recent update) so you can hit your deadline. Reply with the software name/version and what you see for CPU/Memory usage, and we’ll go from there.

Result

#2

Winning Votes

1 / 3

Average Score

79

Total Score

71

Overall Comments

Answer B is a solid, competent response that covers all the required elements. It acknowledges Jamie's frustration, asks six targeted diagnostic questions, provides two actionable troubleshooting steps with good technical detail (Task Manager sorting, safe mode launch), and sets clear next-step expectations. However, the tone is noticeably more clinical and less warm than Answer A. The opening empathy is brief and somewhat formulaic. The response lacks the personal ownership moment and the creative contingency plan (loaner machine) that would make Jamie feel truly supported. Six diagnostic questions is also slightly excessive for an urgent situation. Overall it reads more like a competent technical checklist than a reassuring human interaction.

View Score Details

Persona Consistency

Weight 30%
70

The persona is present but thinner. The empathy opener is brief and somewhat generic ('I'm sorry you're dealing with this'). The response reads more like a structured technical checklist than a warm human interaction. There is no personal ownership moment or creative contingency that would deepen the persona.

Naturalness

Weight 20%
65

The response is readable but leans clinical. The bullet-heavy structure and technical terminology (e.g., 'GPU acceleration', 'corrupted/oversized project file') make it feel more like a support manual than a personal message to a stressed colleague. Naturalness is adequate but not warm.

Instruction Following

Weight 20%
80

All five required elements are present and addressed. Six diagnostic questions is slightly excessive for an urgent scenario, and the next-steps section, while clear, is less actionable than A's (no loaner machine or direct contact option). Solid instruction following overall.

Creativity

Weight 15%
65

The safe mode / Shift-launch tip for design apps is a useful and somewhat creative suggestion. However, the overall approach is fairly standard IT support procedure without standout creative elements. The diagnostic categories (a/b/c/d) in next steps are a nice organizational touch but not particularly creative.

Clarity

Weight 15%
75

The response is clearly organized with numbered questions and labeled steps. Technical instructions are accurate and specific. However, the volume of information and six-question list may feel overwhelming to a panicked user. The diagnostic category breakdown in next steps is clear but adds length.

Judge Models OpenAI GPT-5.4

Total Score

90

Overall Comments

Answer B is highly effective, calm, and practical. It acknowledges urgency without sounding theatrical, asks sharper diagnostic questions, gives actionable troubleshooting steps that directly support diagnosis, and sets clear expectations for escalation. It feels especially competent and natural, with strong balance between empathy and technical usefulness.

View Score Details

Persona Consistency

Weight 30%
90

Very consistent calm-support persona. It sounds composed, competent, and service-oriented without slipping into either robotic language or exaggerated reassurance.

Naturalness

Weight 20%
89

Feels natural and believable as a real internal support reply. The wording is conversational, professional, and specific without sounding canned.

Instruction Following

Weight 20%
94

Fully addresses every requirement in the prompt. It acknowledges frustration, stays in role, asks clear diagnostic questions, provides one to two immediate troubleshooting actions, and sets precise expectations for what happens next.

Creativity

Weight 15%
81

Creativity is expressed through thoughtful troubleshooting structure and plausible branching diagnoses. It is not flashy, but it uses the roleplay effectively to feel realistic and adaptive.

Clarity

Weight 15%
92

Very clear and efficiently structured. The questions are targeted, the steps are easy to follow, and the next-step expectations are explicit without unnecessary clutter.

Total Score

76

Overall Comments

Answer B is a competent and professional response that follows all the basic instructions. It acknowledges the user's frustration and provides a structured plan for diagnosis. However, its tone is less empathetic and reassuring than Answer A's. Furthermore, the suggested troubleshooting steps are too technical for a panicked, non-expert user, asking them to interpret system monitoring tools, which could increase their stress. While good, it lacks the user-centric finesse of the best responses.

View Score Details

Persona Consistency

Weight 30%
75

The response maintains a competent and professional IT persona. It is empathetic in its opening, but it lacks the exceptional calming and reassuring quality of Answer A. The tone is slightly more clinical and less personal.

Naturalness

Weight 20%
80

The language is natural and professional. It reads like a well-written, standard corporate IT support email. It's good, but slightly more formulaic and less conversational than Answer A.

Instruction Following

Weight 20%
75

The answer follows all instructions, but its interpretation of 'simple, immediate troubleshooting steps' is flawed. Asking a panicked user to open Task Manager/Activity Monitor and interpret CPU/Memory usage is not a simple step and could increase their frustration.

Creativity

Weight 15%
70

The response shows some creativity in its diagnostic framework, attempting to categorize the problem into specific buckets. However, this is less creative from a user-experience perspective and doesn't include the kind of reassuring, outside-the-box solutions seen in Answer A.

Clarity

Weight 15%
80

The response is clearly written and well-structured. However, the clarity is somewhat undermined by the complexity of the tasks it asks the user to perform. While the instructions for opening Task Manager are clear, the task itself is not simple for a non-technical user under pressure.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Winning Votes

2 / 3

Average Score

87
View this answer

Winning Votes

1 / 3

Average Score

79
View this answer

Judging Results

Why This Side Won

Answer A is the winner because it excels in embodying the persona of a calm, reassuring, and competent IT specialist. Its empathetic tone, simple and user-friendly troubleshooting steps, and the creative inclusion of a 'worst-case scenario' plan make it far more effective at handling a stressed user than Answer B. Answer B is competent but its proposed actions are too complex for the situation, and it lacks the exceptional level of reassurance that Answer A provides.

Judge Models OpenAI GPT-5.4

Why This Side Won

Answer B wins because it performs better on the most important weighted criteria, especially naturalness, instruction following, and clarity, while maintaining strong persona consistency. Both answers are solid, but B asks more targeted diagnostic questions, gives more immediately useful troubleshooting guidance, and sets cleaner next steps without overpromising. That produces the stronger weighted overall result.

Why This Side Won

Answer A wins on the highest-weighted criterion (persona consistency, 30%) by a clear margin: it maintains a warmer, more human, and more reassuring tone throughout, takes personal ownership, and includes a creative contingency plan. On naturalness (20%), A's conversational flow and empathetic language feel more authentic. Both answers follow instructions well, but A's loaner-machine contingency and charger tip show more creativity (15%). Clarity is comparable, giving a slight edge to A for its structured formatting. The weighted result clearly favors Answer A.

X f L