Claude's New Feature Unlocks Computer Control: AI Assistant Can Now Move Your Mouse and Type

Anthropic recently delivered a revolutionary update to Claude 3.5 Sonnet—it's no longer limited to text conversations. Now it can view your screen, move the cursor, and press keys just like a person, truly helping you operate your computer. If you're still filling out forms manually or copying and pasting data, this upgrade might change the way you work entirely. Let's explore how strong this new "computer operation" capability really is and what scenarios it can handle.

How Does Claude Control a Computer Like a Human?

Anthropic built a dedicated API for Claude that lets it "see" the computer interface—essentially by taking screenshots, understanding button and input field locations, and then generating commands to move the mouse, click, and type. After integrating this API, developers can ask Claude to perform tasks like: "Open the Excel spreadsheet on my desktop, copy the numbers in column B into the web form, and submit it." Claude will inspect the screen step by step, move the cursor, and operate the browser—much like directing an intern remotely.

In the OSWorld benchmark, which evaluates an AI's ability to use a computer, the updated Claude 3.5 Sonnet achieved a score of 14.9% using only screenshots—far ahead of the second-place Cradle BAAI at 7.8%. With more operation steps, its score can climb to 22%. While still behind the human baseline of over 70%, it is currently the most "computer-literate" AI available.

Significant Coding Improvements: More Reliable Code Writing

Beyond computer control, the new Claude 3.5 Sonnet also shows impressive gains in programming. On SWE-bench Verified—a benchmark measuring an AI's ability to solve real-world software problems—its score jumped from 40.6% to 49%, surpassing all public models including OpenAI o1-preview. After testing, GitLab found that Claude's reasoning ability in multi-step software development processes improved by 10%, with no increase in latency. In other words, asking it to write a complete web application module or debug complex code logic is now more dependable than before.

If speed is a priority, Anthropic also offers the new Claude 3.5 Haiku. It costs the same and runs as fast as the previous Haiku, but its intelligence level even surpasses its older sibling Claude 3 Opus. Especially on coding tasks, Haiku scored 40.6% on SWE-bench Verified—stronger than the original Claude 3.5 Sonnet and GPT-4. It's ideal for scenarios requiring rapid iteration and frequent calls, such as automated testing, log analysis, or code completion.

How Can Developers Access These New Capabilities?

The upgraded Claude 3.5 Sonnet is now available to all users. Developers can access the computer control feature (note: it's still in beta) via the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. However, this feature is far from perfect—actions like scrolling, dragging, and zooming, which humans find simple, often trip up Claude, and long screen recordings can cause task interruptions. Still, companies like Asana, Canva, and Replit are already using it to automate repetitive workflows, such as auto-filling forms and inspecting app interface behavior. Claude 3.5 Haiku is expected to launch by the end of the month, initially supporting text only, with image input added later.

If you're a developer or frequently bogged down by tedious tasks like filling out forms and moving data around, it's worth letting Claude handle the clicking. It may still be a clumsy rookie, but its pace of improvement is hard to ignore.

How Does Claude Control a Computer Like a Human?

Significant Coding Improvements: More Reliable Code Writing

How Can Developers Access These New Capabilities?

Search articles

Popular Articles

Some of the best ChatGPT prompts—methods that can truly boost efficiency by 10x

Claude Code Installation Keeps Failing? A Step-by-Step Guide to Fix the Setup in 3 Steps

ChatGPT, Claude, Gemini, and Midjourney output fail-safe troubleshooting checklist and KISS prompt tips

An efficient ChatGPT + Claude + Gemini + Midjourney workflow to solve inconsistent outputs and rewrite meltdowns

ChatGPT and Claude always miss the point: three questioning techniques to make AI instantly understand your needs