Titikey
HomeTips & TricksClaudeClaude Workbench Update: Prompt Generator, Evaluation Mode, and Release Notes Explained in One Article

Claude Workbench Update: Prompt Generator, Evaluation Mode, and Release Notes Explained in One Article

2/22/2026
Claude

If you regularly use Claude for development or prompt tuning, the most noteworthy part of this recent update is that the Workbench has turned “writing prompts” and “comparing prompts” into built-in tools, and the API side has also opened up a longer output limit. This article walks through— in actual usage order—how to use Claude’s prompt generator, evaluation mode, extended output, and the newly launched release notes.

Claude Sonnet 3.5 Extended Output: from 4096 to 8192 tokens

In the Claude API, Claude Sonnet 3.5’s maximum output token limit has doubled from 4096 to 8192. For long-form summarization, code generation, or tasks that require “providing all steps in full,” Claude is less likely to get cut off halfway through.

Enabling it is straightforward: add the request header anthropic-beta with the value max-tokens-3-5-sonnet-2024-07-15. Then set max_tokens the way you normally would, and Claude will operate under the new limit policy.

Workbench Prompt Generator: Describe the task first, then let Claude write the prompt

The new prompt generator in the Claude Console Workbench follows the idea of “you state the requirements, Claude helps you write a reusable prompt.” For example, if you simply describe “categorize and handle incoming customer support requests,” Claude will generate a more complete instruction template, often adding an output format and boundary conditions.

This feature is suited to two types of people: teams that often need to hand off reusable requirements to colleagues, and developers building automation flows who feel their prompts are unstable. Use Claude’s generated version as a draft, then fine-tune it to your business fields—it saves more time than writing from scratch.

Evaluation Mode: Compare multiple prompts side by side and score Claude’s outputs

Evaluation mode in the Workbench lets you display the outputs of two or more prompts side by side and rate Claude’s results on a 5-point scale. It addresses a very practical pain point: for the same task, whether changing a single sentence actually improves things used to be judged only by “feel.”

When using evaluation mode, it’s recommended to keep the input samples fixed (the same batch of user questions, the same text) and change only one prompt variable—such as tone, constraints, or output structure. This helps you pinpoint faster whether the variation comes from differences in Claude’s model performance or from the way the prompt is written.

Usage and Cost Dashboard: Understand your bill by USD, tokens, and API key

In the new “Usage” and “Cost” tabs in the developer console, you can track Claude API usage by dollar amount, token count, and API key. For teams sharing an account across multiple environments (test/production) or projects, this essentially visualizes “who is spending money” directly.

If you’re running A/B prompt experiments, it’s recommended to watch both tokens and cost: some prompts make Claude’s outputs longer, with little improvement in results but a significant increase in cost—the dashboard makes this easy to spot at a glance.

Release Notes and New Documentation Resources: Less likely to get tripped up when updates are frequent

Claude’s documentation now includes more comprehensive release notes covering updates to the API, the Claude console, and the Claude app. For those integrating into production systems, this is more reliable than “digging through announcements everywhere”: you can clearly see where changes occurred and whether they will affect existing calls.

At the same time, Anthropic has also updated its documentation and educational courses (such as Claude API fundamentals and using Claude tools) and expanded the core skills guides in the Claude Cookbook (citations, retrieval-augmented generation, classification). If you want to plug Claude into a toolchain or require structured JSON output, these resources can significantly reduce trial and error.

HomeShopOrders