Orivel Orivel
Open menu

Summarize a Policy Memo with Balanced Tradeoffs

Compare model answers for this Summarization benchmark and review scores, judging comments, and related examples.

Login or register to use likes and favorites. Register

X f L

Contents

Task Overview

Benchmark Genres

Summarization

Task Creator Model

Answering Models

Judge Models

Task Prompt

Read the memo below and write a concise summary of 140 to 180 words for a city council member who has not read it. Your summary must cover the problem, the proposed pilot program, expected benefits, main risks or criticisms, and how success would be measured. Do not quote directly. Memo: Riverton's public buses have lost riders for six consecutive years, even though the city's population has grown. A transportation department review found several causes: routes are infrequent outside downtown, schedules are hard t...

Show more

Read the memo below and write a concise summary of 140 to 180 words for a city council member who has not read it. Your summary must cover the problem, the proposed pilot program, expected benefits, main risks or criticisms, and how success would be measured. Do not quote directly. Memo: Riverton's public buses have lost riders for six consecutive years, even though the city's population has grown. A transportation department review found several causes: routes are infrequent outside downtown, schedules are hard to understand, and buses are often delayed by traffic congestion. Low-income residents and older adults reported the greatest difficulty reaching jobs, clinics, and grocery stores without long waits or costly ride-hailing services. In response, staff propose a two-year "Frequent Corridors" pilot. Instead of spreading service thinly across the entire network, the city would increase weekday frequency to every 10 minutes on five major corridors from 6 a.m. to 9 p.m. Two underused neighborhood routes would be replaced by on-demand shuttles that riders could book by phone or app. The plan would also add larger bus-stop signs, simplified maps, and a real-time arrival display at the central transfer station. Supporters argue that riders value reliability and simplicity more than broad but infrequent coverage. They say concentrating resources on the busiest corridors could attract new riders, reduce missed transfers, and improve access to major employers and the community college. They also note that on-demand shuttles may serve low-density areas more efficiently than nearly empty fixed-route buses. Critics raise several concerns. Some disability advocates worry that app-based booking could disadvantage riders without smartphones, although the proposal includes phone reservations. Labor representatives warn that the shuttle service could be outsourced later, potentially affecting union jobs. Environmental groups support transit investment overall but question whether replacing fixed routes with smaller vehicles might reduce total passenger capacity. Some residents also fear that neighborhoods losing direct bus lines will feel abandoned, even if average wait times fall. The pilot is estimated to cost 8 million dollars over two years. Staff suggest funding it through a mix of state transit grants, parking revenue, and delaying a planned downtown streetscape project. They propose evaluating the pilot using ridership changes, average wait times, on-time performance, transfer success rates, customer satisfaction surveys, and access to essential destinations for low-income households. If the pilot fails to improve ridership and reliability within 18 months, staff recommend ending it early or redesigning it.

Task Context

The task tests whether the model can compress a policy-oriented passage while preserving key facts, tradeoffs, and evaluation criteria.

Judging Policy

A strong answer is accurate, concise, and well organized. It should clearly identify the transit problems, explain the main elements of the pilot, mention both potential advantages and notable criticisms, and include the proposed measures of success. It should avoid direct quotation, major omissions, unsupported opinions, and factual distortions. Good summaries prioritize the most decision-relevant information rather than minor details.

Answers

Riverton has experienced six straight years of declining bus ridership despite population growth, largely due to infrequent service outside downtown, confusing schedules, and traffic delays. Low-income residents and older adults are disproportionately affected, struggling to reach jobs, medical facilities, and essential services. To address this, city staff propose a two-year Frequent Corridors pilot costing eight million dollars, funded through state grants, parking revenue, and deferring a streetscape project. The plan would boost weekday frequency to every ten minutes on five key corridors, replace two low-rid...

Show Full Answer

Riverton has experienced six straight years of declining bus ridership despite population growth, largely due to infrequent service outside downtown, confusing schedules, and traffic delays. Low-income residents and older adults are disproportionately affected, struggling to reach jobs, medical facilities, and essential services. To address this, city staff propose a two-year Frequent Corridors pilot costing eight million dollars, funded through state grants, parking revenue, and deferring a streetscape project. The plan would boost weekday frequency to every ten minutes on five key corridors, replace two low-ridership neighborhood routes with on-demand shuttles bookable by phone or app, and improve signage and real-time arrival information. Proponents believe concentrating resources on high-demand corridors will attract riders and improve reliability, while shuttles can serve low-density areas more efficiently. Critics raise concerns about accessibility for riders without smartphones, potential outsourcing of shuttle operations threatening union jobs, reduced passenger capacity, and neighborhood abandonment. Success would be measured by ridership, wait times, on-time performance, transfer rates, customer satisfaction, and equitable access, with an early exit option at eighteen months if targets are unmet.

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

92
Judge Models OpenAI GPT-5.4

Total Score

88

Overall Comments

This is a strong, concise summary that captures the memo’s central problem, the pilot design, the main arguments for and against it, funding, and the proposed evaluation approach. It is generally faithful to the source and well organized. Minor weaknesses are that it slightly generalizes some details, such as access to essential destinations for low-income households into broader equitable access, and it omits a few specific benefit examples like access to major employers and the community college.

View Score Details

Faithfulness

Weight 40%
88

The summary is highly accurate on the ridership decline, causes, affected groups, pilot elements, funding sources, criticisms, and evaluation timeline. It avoids major distortions. A small deduction is warranted because it slightly broadens the memo’s specific measure of access to essential destinations for low-income households into the more general phrase equitable access.

Coverage

Weight 20%
87

It covers all major required elements: the problem, proposed pilot, expected benefits, risks or criticisms, and success measures. It also includes funding and the early termination or redesign option. Coverage is not perfect because it leaves out some concrete benefit details, such as improved access to major employers and the community college, and does not mention the central transfer station specifically.

Compression

Weight 15%
91

The response is efficiently compressed while preserving substantial policy-relevant content. It fits the requested concise-summary style and avoids unnecessary detail. It is slightly dense, but still economical given the number of memo elements included.

Clarity

Weight 15%
89

The writing is clear, readable, and easy for a council member to follow. Key points are expressed directly and the tradeoffs are understandable. A minor limitation is that the long sentence listing many pilot features and concerns makes the middle portion somewhat packed.

Structure

Weight 10%
88

The summary follows a logical progression from problem to proposal to benefits, criticisms, and evaluation. This organization supports quick understanding. It is presented as one paragraph rather than more visibly segmented sections, but the internal flow remains strong.

Total Score

89

Overall Comments

This is a strong, well-crafted summary that accurately and efficiently covers all major elements of the memo. It correctly identifies the problem, the pilot's structure and cost, funding sources, key benefits, notable criticisms, and evaluation metrics. The paraphrasing is faithful without direct quotation, and the writing is clear and organized. The summary falls within the required word count range. Minor weaknesses include a slightly dense final sentence that packs in many metrics at once, and the lack of explicit mention of the community college or major employers as access destinations, though these are minor omissions. Overall, this is a high-quality response that would serve a council member well.

View Score Details

Faithfulness

Weight 40%
92

The summary accurately reflects the memo's content without distortion or fabrication. All key facts are correctly represented: six years of ridership decline, the five corridors with 10-minute frequency, two routes replaced by on-demand shuttles, the eight million dollar cost, funding sources, and the 18-month early exit clause. The criticisms and benefits are faithfully paraphrased. No unsupported claims are introduced.

Coverage

Weight 20%
88

All five required elements are present: the problem, the pilot program, expected benefits, main risks or criticisms, and success metrics. The summary captures the most decision-relevant details including equity concerns, funding mechanism, and the early termination option. A minor gap is the omission of specific access destinations like the community college and major employers, which were highlighted in the memo as key beneficiaries.

Compression

Weight 15%
90

The summary is concise and falls within the 140 to 180 word target range. It avoids unnecessary repetition and minor details while preserving all critical information. The compression is efficient without sacrificing substance, demonstrating strong editorial judgment about what to include and what to omit.

Clarity

Weight 15%
85

The writing is generally clear and accessible for a non-specialist reader such as a council member. Most sentences are well-constructed. The final sentence is somewhat dense, listing six evaluation metrics in a single clause, which slightly reduces readability. Otherwise, the language is precise and professional throughout.

Structure

Weight 10%
85

The summary follows a logical flow that mirrors the memo's structure: problem, pilot details, benefits, criticisms, and evaluation. The progression is natural and easy to follow. It reads as a coherent paragraph rather than a disjointed list, which suits the essay format. A slight improvement could come from a clearer transitional signal before the criticisms section.

Total Score

99

Overall Comments

The summary is excellent. It is accurate, comprehensive, and well-structured, perfectly adhering to all prompt requirements. It effectively covers the problem, the proposed solution, its pros and cons, and the evaluation criteria within the specified word count. The clarity and logical flow make it highly suitable for its intended audience of a city council member.

View Score Details

Faithfulness

Weight 40%
100

The summary perfectly reflects the source memo's content without any factual errors or distortions. It accurately captures the nuances of the problem, the pilot's details, the arguments from both supporters and critics, and the proposed success metrics.

Coverage

Weight 20%
100

The answer comprehensively addresses all five required components of the prompt: the problem, the pilot program, benefits, risks/criticisms, and success measurement. It also includes key contextual details like cost and funding, which adds to its completeness.

Compression

Weight 15%
90

The summary is very efficient, packing all the essential information into 179 words, which is just within the 180-word limit. It successfully paraphrases the source material and omits minor details to focus on the most decision-relevant facts.

Clarity

Weight 15%
100

The language is exceptionally clear, concise, and professional. It is perfectly suited for a city council member, presenting the complex policy tradeoffs in an easily digestible format without jargon.

Structure

Weight 10%
100

The summary is structured logically, moving sequentially from the problem statement to the proposed solution, then to the balanced tradeoffs (benefits vs. risks), and concluding with the evaluation plan. This organization makes the information very easy to follow.

Riverton's public bus system suffers from six years of declining ridership, attributed to infrequent routes, confusing schedules, and traffic delays. This particularly impacts low-income and older residents seeking access to essential services. In response, staff propose a two-year "Frequent Corridors" pilot program. This initiative would increase weekday bus frequency to every 10 minutes on five main routes, while replacing two underused lines with on-demand shuttles accessible via phone or app. The plan also includes improved signage and real-time arrival displays. Supporters anticipate attracting new riders th...

Show Full Answer

Riverton's public bus system suffers from six years of declining ridership, attributed to infrequent routes, confusing schedules, and traffic delays. This particularly impacts low-income and older residents seeking access to essential services. In response, staff propose a two-year "Frequent Corridors" pilot program. This initiative would increase weekday bus frequency to every 10 minutes on five main routes, while replacing two underused lines with on-demand shuttles accessible via phone or app. The plan also includes improved signage and real-time arrival displays. Supporters anticipate attracting new riders through enhanced reliability, reducing missed transfers, and improving access to key destinations. Conversely, critics express concerns about app accessibility for all users, potential job outsourcing for shuttle services, and the possibility of reduced overall passenger capacity. Neighborhoods losing direct lines also fear feeling neglected. The $8 million pilot's success will be evaluated based on changes in ridership, average wait times, on-time performance, transfer success rates, customer satisfaction, and improved access for low-income households.

Result

#2

Winning Votes

0 / 3

Average Score

88
Judge Models OpenAI GPT-5.4

Total Score

87

Overall Comments

This is a strong, concise summary that captures the memo’s core problem, the main components of the pilot, major expected benefits, key criticisms, and the proposed evaluation metrics. It is generally accurate and easy to follow. The main weakness is some loss of nuance and a few omitted decision-relevant details, especially the funding approach, the central transfer station detail, and the memo’s proposed 18-month off-ramp if results are weak.

View Score Details

Faithfulness

Weight 40%
87

The summary is highly consistent with the memo and avoids obvious distortion. It correctly reports the ridership decline, causes, pilot design, major benefits, criticisms, cost, and evaluation measures. Minor fidelity loss comes from small omissions such as the specific service window of 6 a.m. to 9 p.m., the mention of major employers and the community college, and the memo’s recommendation to end or redesign the pilot within 18 months if it underperforms.

Coverage

Weight 20%
82

It covers the required elements well: problem, proposed pilot, expected benefits, risks or criticisms, and success measures. However, it omits some relevant details that would strengthen a policy memo summary, especially the funding sources, the simplified maps and central transfer station context, and the explicit early termination or redesign condition. Coverage is solid but not fully comprehensive.

Compression

Weight 15%
90

The response is efficiently condensed and stays focused on high-value information without unnecessary detail. It reads like a true summary rather than a paraphrase of every paragraph. It appears to fit the requested length range while preserving most of the essential content.

Clarity

Weight 15%
88

The writing is clear, readable, and easy for a council member to scan quickly. Cause, proposal, benefits, criticisms, and evaluation are all understandable. A slightly sharper distinction between supporter claims and staff evaluation plans would improve precision, but overall clarity is strong.

Structure

Weight 10%
86

The summary is logically organized, moving from the problem to the pilot, then benefits, criticisms, and evaluation. This sequence supports quick comprehension. It would be even stronger with slightly more explicit signaling of tradeoffs and a clearer concluding sentence about how decision-makers should interpret the pilot’s success threshold.

Total Score

81

Overall Comments

This is a strong, well-organized summary that accurately captures the core elements of the memo. It covers the problem, the pilot program's main components, benefits, criticisms, and evaluation metrics without direct quotation. The word count falls within the required 140–180 word range. The summary is faithful to the source, avoids distortion, and presents a balanced view of tradeoffs. Minor weaknesses include the omission of the funding mechanism (state grants, parking revenue, delayed streetscape project) and the early-termination clause at 18 months, both of which are decision-relevant details for a council member. The phrase 'Frequent Corridors' is placed in quotation marks, which borders on direct quotation but is acceptable as a proper name. Overall, this is a competent and concise summary that serves its intended audience well.

View Score Details

Faithfulness

Weight 40%
85

The summary accurately reflects the memo's content without introducing distortions or unsupported claims. All stated facts align with the source. The only notable omission affecting faithfulness is the absence of the funding breakdown and the 18-month early-termination provision, which are factual details present in the memo.

Coverage

Weight 20%
75

The summary covers the five required elements: problem, pilot program, benefits, risks, and success metrics. However, it omits the funding mechanism (state grants, parking revenue, deferred streetscape project) and the conditional early-exit clause, both of which are decision-relevant for a council member evaluating the proposal.

Compression

Weight 15%
80

The summary is concise and falls within the 140–180 word target. It avoids padding and unnecessary detail while retaining the most important points. The compression is efficient without sacrificing clarity or accuracy.

Clarity

Weight 15%
80

The writing is clear, professional, and accessible to a non-specialist reader such as a council member. Sentences are well-constructed and the language is appropriately formal. The transition from benefits to criticisms using 'Conversely' is smooth and effective.

Structure

Weight 10%
80

The summary follows a logical sequence: problem, proposal, benefits, criticisms, and evaluation. This mirrors the memo's own structure and makes it easy to follow. There is no explicit section labeling, but the flow is coherent and well-organized for a prose summary.

Total Score

96

Overall Comments

The summary is outstanding. It successfully captures all the required elements of the memo—problem, proposed pilot, benefits, risks, and success metrics—with excellent accuracy and faithfulness to the source text. The structure is logical and easy to follow, and the language is clear, professional, and well-suited for the intended audience. It adheres perfectly to the length constraint, demonstrating strong compression skills by prioritizing the most critical information. There are no significant weaknesses.

View Score Details

Faithfulness

Weight 40%
95

The summary is exceptionally faithful to the source memo. It accurately represents the problem, the details of the pilot program, the arguments for it, the multiple criticisms against it, and the evaluation criteria. There are no distortions, misinterpretations, or added information. It correctly notes that shuttles are accessible by phone, not just an app.

Coverage

Weight 20%
100

The summary provides complete coverage of all five components requested in the prompt: the problem, the proposed pilot, expected benefits, main risks/criticisms, and success metrics. It even includes the cost of the pilot, which is a relevant detail. No essential information has been omitted.

Compression

Weight 15%
95

The model demonstrates excellent compression, delivering a comprehensive summary in 174 words, well within the 140-180 word target. It effectively condenses multiple paragraphs of the memo into concise sentences without losing the core meaning or critical details. The information selection is highly effective for a high-level briefing.

Clarity

Weight 15%
90

The summary is very clearly written. The language is professional and precise, making it easy for the target audience (a city council member) to understand the key issues and tradeoffs at a glance. The sentences are well-constructed and free of ambiguity.

Structure

Weight 10%
100

The structure is flawless. It follows a logical progression that perfectly mirrors the prompt's requirements: starting with the problem, moving to the proposed solution, then presenting the pros and cons, and concluding with the evaluation plan. This organization makes the information highly accessible and easy to digest.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Winning Votes

3 / 3

Average Score

92
View this answer

Winning Votes

0 / 3

Average Score

88
View this answer
X f L