Introduction to new Claude API features: long context, citations, and console upgrades

If you’re using the Claude API for customer support, RAG, or a coding assistant, several recent updates are well worth adopting right away: longer outputs, clearer citations, more cost-effective prompt reuse, and more practical console tools. Below is an overview of the new Claude API features from the perspective of “what you can start using immediately.”

Longer output: Sonnet extended to 8192 tokens

The Claude API now supports Claude Sonnet 3.5’s extended output capability, increasing the maximum single response from 4096 to 8192. For tasks like long-form summarization, code generation, and report writing, truncation will be noticeably reduced.

Enabling it is straightforward: add the specified beta request header when calling the Claude API to let the server raise the model’s output limit. Before rolling it out, it’s recommended to compare completion integrity at the end and hallucination rate using the same set of inputs before and after enabling, and then decide whether to turn it on by default.

Long context: Sonnet offers 1M-token preview support

The Claude API provides preview support for an ultra-long context window on Claude Sonnet 4, and has also increased the rate limits related to long-context usage. For tasks such as “reviewing an entire code repository,” “comparing a full set of contracts/tender documents,” and “Q&A across a multi-chapter knowledge base,” long context can significantly reduce the engineering overhead of chunking and stitching.

Note that longer context does not mean cheaper: when prompts get very long, billing and throughput become more sensitive. In production, you can layer content into “original text that must be in the context” versus “material that can be retrieved,” to avoid stuffing everything in at once.

Citations and search-result content blocks: making RAG feel more like “verifiable answers”

The Claude API now provides citation capabilities to attribute sources in responses; at the same time, search-result content blocks are officially available, making them suitable for retrieval-augmented generation (RAG) to produce “responses with sources.” For scenarios like compliance, legal, and after-sales knowledge bases, citations can reduce back-and-forth disputes: users can see where the answer is coming from.

In practice, it’s recommended to bake the “citation display format” into the system prompt—for example, require output as bullet points and append the corresponding source marker after each point—to ensure stable frontend rendering.

Lower cost: more flexible prompt caching and tool-calling control

The Claude API provides prompt caching (beta), which can cache and reuse long prompts to significantly reduce latency and cost. A typical approach is to put infrequently changing system instructions, lengthy policies, and fixed examples into the cache, and send only the user’s current question as the dynamic part.

In addition, the Messages API’s tool_choice now supports none, which can force the model not to call any tools; and when tool_use/tool_result blocks are included, it no longer requires that tools must be provided. For applications that require “safe plain-text output” or “phased execution of a toolchain,” the Claude API offers higher controllability.

A more usable console: Workbench evaluations and a usage & cost dashboard

The Claude Console Workbench adds a prompt generator and an evaluation mode: you just describe the task, and the system can draft a prompt for you, while supporting side-by-side comparison of multiple versions and scoring to choose the best one. For teams that need to quickly iterate on scripts, classification rules, or extraction formats, this is much more efficient than blindly tweaking prompts in code.

The developer console’s usage and cost dashboard is also more complete, supporting spend tracking by USD amount, token count, and API key. It’s recommended to set up Claude API budget alerts together with key-level quota policies to avoid a single mistaken call blowing up the bill.

Longer output: Sonnet extended to 8192 tokens

Long context: Sonnet offers 1M-token preview support

Citations and search-result content blocks: making RAG feel more like “verifiable answers”

Lower cost: more flexible prompt caching and tool-calling control

A more usable console: Workbench evaluations and a usage & cost dashboard

Search articles

ChatGPT Pro Subscription | 30% Off | Credited in 1 Minute | Renewal Supported

Spotify Premium 3-Month Subscription | $10 Top-Up | For Your Own Account | Ad-Free Offline Listening

Popular Articles

Some of the best ChatGPT prompts—methods that can truly boost efficiency by 10x

Claude Code Installation Keeps Failing? A Step-by-Step Guide to Fix the Setup in 3 Steps

ChatGPT, Claude, Gemini, and Midjourney output fail-safe troubleshooting checklist and KISS prompt tips

An efficient ChatGPT + Claude + Gemini + Midjourney workflow to solve inconsistent outputs and rewrite meltdowns

ChatGPT and Claude always miss the point: three questioning techniques to make AI instantly understand your needs