Titikey
HomeTips & TricksClaudeIntroduction to New Features in the Claude API Workbench: Extended Output, Evaluation Mode, and the Usage Dashboard

Introduction to New Features in the Claude API Workbench: Extended Output, Evaluation Mode, and the Usage Dashboard

2/16/2026
Claude

Recent Claude updates for developers have leaned more “practical”: not only improving model capabilities, but also filling in everyday necessities like debugging, comparing prompts, and checking billing. This article breaks down the new features in the Claude API and the Claude Console workbench that are worth using immediately, explained clearly by usage scenario.

Claude Sonnet 3.5 Extended Output: Writing long-form content is easier—and easier to control

In the Claude API, Claude Sonnet 3.5’s maximum output tokens have been increased from 4096 to 8192, making it suitable for “write it all in one go” tasks such as long reports, code generation, and meeting minutes. To enable extended output, you need to include the specified beta request header in your request.

The official approach is to add: "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15". It’s also recommended to set max_tokens closer to the length you actually need, to avoid unnecessary costs from Claude being able to output more.

Workbench Prompt Generator: Turn a “requirements description” into reusable prompts

The Claude Console workbench has added a prompt generator. You simply describe the task in natural language (e.g., “categorize and handle incoming customer support requests”), and Claude will produce a more complete prompt draft. For teams that need standardized outputs and batch processing, this step can significantly reduce repetitive trial and error.

In practice, it’s recommended to fill in three things in your input: the goal, the output format, and the boundary conditions. This makes the prompts Claude generates more likely to be directly usable, rather than “looking professional but being hard to execute.”

Evaluation Mode: Compare prompts side by side—rely less on intuition and more on results

The workbench’s evaluation mode supports displaying the outputs of two or more prompts side by side, and scoring Claude’s outputs on a 5-point scale. It’s especially useful for prompt A/B testing: with the same batch of sample inputs, see which prompt set is more stable and better matches formatting requirements.

If you’re working on quantifiable tasks like classification, extraction, or summarization, it’s recommended to first use evaluation mode to establish a fixed “sample question set.” After that, each time you fine-tune a prompt, you can quickly tell whether it truly improved—rather than relying on how a single conversation happens to feel.

Usage and Cost Dashboard: More intuitive tracking by dollars, tokens, and API key

The Claude developer console has added “Usage” and “Cost” tabs, letting you view consumption by dollar amount, token count, and API key. For scenarios with multiple environments (testing/production) or multiple projects sharing the Claude API, this view makes it faster to pinpoint “who exactly burned through all the tokens.”

An even more practical approach is to split API keys by project, and use the dashboard to periodically review peak time windows. That way, when you optimize prompts or shorten output length, you can validate the savings directly with data.

Release Notes and Learning Resources: Make Claude’s changes “trackable and learnable”

Claude’s documentation now includes more comprehensive release notes covering update histories for the API, the Claude Console, and the Claude app, making it easier to troubleshoot “why the same request now yields different results.” At the same time, official training courses (such as Claude API Fundamentals and using Claude tools) have been launched, and the Claude Cookbook has been expanded—rounding out practical materials for common capabilities like citations, retrieval-augmented generation, and classification.

If you want to turn Claude into a stable production toolchain, the value of these resources is that they reduce reliance on word-of-mouth “magic parameters,” and give the team a unified standard for Claude’s capability boundaries and best practices.

HomeShopOrders