If you use the Claude API day to day for customer support automation, content generation, or coding assistance, this round of Developer Console updates will feel much more “smooth to use.” The focus this time isn’t flashy features, but making the three most time-consuming things—output limits, prompt debugging, and billing tracking—clearer and more controllable. Below, in the order you’d actually use them, I’ll walk through the key changes in depth.
Claude API extended output: Sonnet 3.5 up to 8192 tokens
In the Claude API, the maximum output tokens for Claude Sonnet 3.5 has been increased from 4096 to 8192, making it less likely to get “cut off halfway” when doing long-form summaries, batch rewrites, or generating more complete technical documentation. To enable extended output, you need to include a specific beta request header in your request.
Specifically, add: "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15" when making the call, and also set your max_tokens higher (within what your account allows). It’s recommended to run a small-traffic load test first, because once outputs get longer, both response time and token costs become more sensitive.
The Workbench is more usable: the Prompt Generator turns “writing prompts from scratch” into a reusable workflow
A new “Prompt Generator” has been added to the Workbench in the Claude API console. You just describe the task in one sentence (for example, “classify and handle incoming customer support requests”), and it will produce a structured prompt skeleton. This is especially time-saving for team collaboration: colleagues don’t need to each write their own, and it’s easier to standardize templates and wording.
The approach I recommend more is: let the generator produce a first draft, then add your business constraints—for example, “must output JSON,” “must not fabricate sources,” or “must reference field names from the input.” Prompts produced this way are more stable and better suited to being deployed directly in a production Claude API environment.
Evaluation mode: turn Claude API prompt tuning into “controlled experiments”
The Workbench’s “Evaluation mode” supports side-by-side comparison of outputs from two or more prompts, and lets you score the results on a 5-point scale. It solves an old problem: you think B is better, but it just happened to match a particular test sample.


