In its latest major update, ChatGPT introduced several exciting feature upgrades, with the rollout of the GPT-4o model marking a significant milestone. This update not only improves response speed but also brings AI closer to real human interaction, evolving from simple text conversations to understanding images, sounds, and emotions. This article takes you through these new ChatGPT features and explores how they are changing our daily usage habits.
GPT-4o Model: The Perfect Fusion of Versatility and Speed
The "o" in GPT-4o stands for "omni," integrating audio, video, and text reasoning into a true multimodal model. Compared to the previous GPT-4 Turbo, GPT-4o's API is faster and up to 50% cheaper. Responses are nearly instantaneous, with speeds twice as fast as GPT-4. Users can now experience smoother conversations in ChatGPT without long wait times.
Excitingly, GPT-4o can engage in real-time conversations like a human, even detecting emotions behind the user's tone. For example, it can tell from heavy breathing that you’ve just exercised and offer a personalized reply. Two GPT-4o instances can even talk to each other, describe what they see, or sing a song together, demonstrating stronger collaboration between AI. These new ChatGPT features greatly enhance the naturalness and fun of interaction.
Multimodal Interaction and Visual Recognition
One of the core upgrades in GPT-4o is its visual capability. It can now effectively assist visually impaired users in understanding their surroundings, such as reporting directions or hailing a taxi. In a demo, after scanning the environment, GPT-4o instantly recognized objects and inferred possible work scenarios, showing great potential in healthcare and personal assistance.

