If you’re using the Claude API for customer support, RAG, or a coding assistant, several recent updates are well worth adopting right away: longer outputs, clearer citations, more cost-effective prompt reuse, and more practical console tools. Below is an overview of the new Claude API features from the perspective of “what you can start using immediately.”
Longer output: Sonnet extended to 8192 tokens
The Claude API now supports Claude Sonnet 3.5’s extended output capability, increasing the maximum single response from 4096 to 8192. For tasks like long-form summarization, code generation, and report writing, truncation will be noticeably reduced.
Enabling it is straightforward: add the specified beta request header when calling the Claude API to let the server raise the model’s output limit. Before rolling it out, it’s recommended to compare completion integrity at the end and hallucination rate using the same set of inputs before and after enabling, and then decide whether to turn it on by default.
Long context: Sonnet offers 1M-token preview support
The Claude API provides preview support for an ultra-long context window on Claude Sonnet 4, and has also increased the rate limits related to long-context usage. For tasks such as “reviewing an entire code repository,” “comparing a full set of contracts/tender documents,” and “Q&A across a multi-chapter knowledge base,” long context can significantly reduce the engineering overhead of chunking and stitching.
Note that longer context does not mean cheaper: when prompts get very long, billing and throughput become more sensitive. In production, you can layer content into “original text that must be in the context” versus “material that can be retrieved,” to avoid stuffing everything in at once.
Citations and search-result content blocks: making RAG feel more like “verifiable answers”
The Claude API now provides citation capabilities to attribute sources in responses; at the same time, search-result content blocks are officially available, making them suitable for retrieval-augmented generation (RAG) to produce “responses with sources.” For scenarios like compliance, legal, and after-sales knowledge bases, citations can reduce back-and-forth disputes: users can see where the answer is coming from.


