This article focuses on several key new features of ChatGPT: voice and image understanding enabled by multimodal models, cloud file importing, the desktop experience, and more transparent memory controls. They push ChatGPT beyond merely “being able to chat,” turning it into a more handy work assistant. Below, I’ll break them down by usage scenarios.
GPT-4o Multimodal: smoother text, voice, and image interactions
As GPT-4o becomes one of ChatGPT’s core models, the experience of having ChatGPT handle text, voice, and images within the same conversation is more complete. You can have ChatGPT understand what’s in an image and then follow up with text questions for details, or switch to speaking your request out loud. For everyday writing, spreadsheet understanding, and extracting information from images, the biggest change is that the “back-and-forth explanation cost” has clearly dropped.
Advanced Voice Mode: more like a conversation, not reading a script
After ChatGPT’s Advanced Voice Mode began rolling out to users in batches, improving the naturalness, response speed, and stability of voice conversations became key priorities. It doesn’t just read text answers aloud; it’s closer to the rhythm of real-time conversation, making it suitable for quickly reviewing an outline before a meeting, or dictating ideas while walking and having ChatGPT organize them. Note that this feature is typically released in phases; whether you can see it depends on your account interface.
Import directly from Google Drive / OneDrive: one less step for data analysis
When making reports or doing data analysis, ChatGPT supports selecting and uploading files directly from Google Drive and Microsoft OneDrive, eliminating the repeated steps of downloading and hunting for files locally. After you hand a spreadsheet to ChatGPT, you can keep asking follow-up questions like “How can the chart be made clearer?” or “Are the definitions consistent?”, and export customized charts for presentations. It’s recommended to confirm the file contains no sensitive fields before uploading, to avoid bringing data that shouldn’t be shared into the conversation.
Memory and new controls: more personalized, and more controllable
ChatGPT’s memory feature has been opened to more users across versions, with clearer prompts and control options added: when ChatGPT updates its memory, it will more proactively tell you what happened. You can think of it as “toggleable preference tracking,” such as your commonly used tone or work background, but it’s not suitable for storing passwords, ID numbers, and similar information. A safer approach is to let ChatGPT remember only “writing style/format preferences,” rather than specific private content.
Desktop app and no-account usage: lower barrier to entry, but the experience differs
The ChatGPT macOS app provides a quicker way to invoke it (for example, Option + Space), and supports uploading files and photos from the desktop and having voice conversations, which is well suited to treating ChatGPT as an on-call toolbar tool. At the same time, ChatGPT also offers an entry point for “use without an account,” but with limitations in chat saving, sharing, and personalization capabilities. If you care more about a continuous workflow and history, using the full logged-in experience is still recommended.