In the latest round of updates, ChatGPT has been upgraded from “able to chat” to “able to listen, see, and collaborate.” If you usually use it for writing, translation, meeting notes, or data analysis, these new features will noticeably change your workflow pace. Below, I’ll clarify the key changes by scenario.
Multimodal upgrade: ChatGPT is more like an “all-purpose assistant”
ChatGPT has gradually adopted GPT-4o as its core capability backbone, with a focus on multimodality: more natural understanding of text, voice, and images. You can directly drop in a screenshot and have ChatGPT explain the interface, spot where something went wrong, or turn chart contents into readable takeaways. Compared with the past, when you had to describe everything back and forth using only text, the communication cost is lower.
Advanced Voice Mode: smoother conversations, responses that feel more human
Voice has always been a very practical entry point for ChatGPT. After “Advanced Voice Mode” began rolling out in testing to some users, its realism and coherence have drawn more attention. The value it brings isn’t just “being able to talk,” but being better suited to continuous follow-up questions, real-time interruptions, and multi-turn discussions. For those who need speaking practice or dictate meeting minutes, ChatGPT’s usability improves significantly.


