If you regularly use Claude for development or prompt tuning, the most noteworthy part of this recent update is that the Workbench has turned “writing prompts” and “comparing prompts” into built-in tools, and the API side has also opened up a longer output limit. This article walks through— in actual usage order—how to use Claude’s prompt generator, evaluation mode, extended output, and the newly launched release notes.
Claude Sonnet 3.5 Extended Output: from 4096 to 8192 tokens
In the Claude API, Claude Sonnet 3.5’s maximum output token limit has doubled from 4096 to 8192. For long-form summarization, code generation, or tasks that require “providing all steps in full,” Claude is less likely to get cut off halfway through.
Enabling it is straightforward: add the request header anthropic-beta with the value max-tokens-3-5-sonnet-2024-07-15. Then set max_tokens the way you normally would, and Claude will operate under the new limit policy.
Workbench Prompt Generator: Describe the task first, then let Claude write the prompt
The new prompt generator in the Claude Console Workbench follows the idea of “you state the requirements, Claude helps you write a reusable prompt.” For example, if you simply describe “categorize and handle incoming customer support requests,” Claude will generate a more complete instruction template, often adding an output format and boundary conditions.
This feature is suited to two types of people: teams that often need to hand off reusable requirements to colleagues, and developers building automation flows who feel their prompts are unstable. Use Claude’s generated version as a draft, then fine-tune it to your business fields—it saves more time than writing from scratch.
Evaluation Mode: Compare multiple prompts side by side and score Claude’s outputs
Evaluation mode in the Workbench lets you display the outputs of two or more prompts side by side and rate Claude’s results on a 5-point scale. It addresses a very practical pain point: for the same task, whether changing a single sentence actually improves things used to be judged only by “feel.”


