• About Us
  • Contact Us
  • Advertise
  • Internship
Friday, May 16, 2025
Business Verce
No Result
View All Result
  • Home
  • Featured
  • Business
  • Tech
  • Lifestyle
  • Sports
  • India
  • World
  • Home
  • Featured
  • Business
  • Tech
  • Lifestyle
  • Sports
  • India
  • World
No Result
View All Result
Business Verce
No Result
View All Result
  • Home
  • Featured
  • Business
  • Tech
  • Lifestyle
  • Sports
  • India
  • World
Home Tech

GPT-4o, OpenAI’s new flagship model, can respond to audio inputs in less than 232 ms!

GPT-4o can now reason with audio, vision and text in real time better than the earlier audio-only AI model.

by Ashmeet Singh Doad
May 14, 2024
in Tech
Reading Time: 3 mins read
GPT-4o, OpenAI's new model
Share on FacebookShare on Twitter

OpenAI has released its new flagship model, GPT-4o. This new model can now reason with audio, vision and text in real time better than the earlier audio-only AI model.

GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction because now the model will accept combinations of audio, text and image to generate any combination of text, audio and image.

The new model can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation. It will match the Turbo performance of GPT-4 on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN

Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx

— OpenAI (@OpenAI) May 13, 2024

Model Capabilities

The new model has enhanced capabilities in which you can not only give questions to get demanded responses but can also interact with the model. The model can now hold a conversation, a human skill, to solve the problem. The model’s voice just like humans can change depending on the conversation. The tone changes on demand with slow and fast-talking and singing on demand.

Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. The present model takes audio as input and then translates it into text, or text into text and the model turns text into audio. This leads to loss of information and loss of ‘human essence’. The model couldn’t detect tone of the speaker, background noises, or emotions of laughter or sadness.

This new model of GPT-4o is a single model end-to-end across text, vision and audio, meaning all inputs and outputs are processed by the same network, unlike the previous model.

As this is the first model to do all combine all types of inputs and outputs on the same network, there are some areas which still needs to be explored to cross the limitations.

Model Evaluations

The new GPT-4o sets new high watermarks on multilingual, audio and vision capabilities.

GPT-4o sets a new high score of 87.2 per cent on 5-shot MMLU (general knowledge questions).

The Audio ASR performance of GPT-4o is better in all regions as compared to Whisper-v3. In all regions, the mistakes/errors are lower in the new model. It dramatically improves speech recognition performance over Whisper-v3 across all languages, particularly for lower-resourced languages.

The model also outperforms the Whisper-v3 in speech translation on the MLS benchmark.

The M3Exam Benchmark is both a multilingual and vision evaluation, consisting of multiple choice questions from other countries’ standardized tests that sometimes include figures and diagrams. GPT-4o is stronger than GPT-4 on this benchmark across all languages.

In evaluation tests for visual perception benchmarks, GPT-4o performs better than all the other present models like Gemini 1.0 Ultra, GPT-4T and Claude Opus.

Model Safety and Limitations

GPT-4o has safety built-in by design across modalities, through techniques such as filtering training data and refining the model’s behaviour through post-training.

The evaluations of cybersecurity, CBRN, persuasion and model autonomy show that GPT-4o doesn’t score above medium risk in any category. GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities.

Over the upcoming weeks, OpenAI will be working on technical infrastructure, usability via post-training, and safety necessary to release the other modalities.

GPT-4o Model availability

GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. GPT-4o is made available in the free tier, and to Plus users with up to 5x higher message limits. A new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus will also roll out in the coming weeks.

Tags: AIAI developmentAI technologyChatGPTOpenAI
ShareTweetSharePinSendSendShareShare
Previous Post

Zomato announces its Q4 Results: Reports record profit of Rs. 175 cr

Next Post

Supporting the opposition is the return to pre-1992 India: Jaishankar

Next Post
india

Supporting the opposition is the return to pre-1992 India: Jaishankar

Comments 2

  1. Pingback: Google I/O 2024 Highlights: Gemini 1.5 Pro, Project Astra, etc. unveiled - Business Verce
  2. Pingback: Reddit-OpenAI deal: Shares jump by 11% - Business Verce

Most Recent

Where to Trade Gold: A Beginner’s Guide to Vantage and Exness

Where to Trade Gold: A Beginner’s Guide to Vantage and Exness

October 14, 2024
APIKARO: The Smart Automation Platform Reshaping Business Communication

APIKARO: The Smart Automation Platform Reshaping Business Communication

July 22, 2024

Major PAN card misuse scam likely unfolding across India; numerous cases already reported

June 18, 2024
employees

Lawsuit filed against Apple: Company underpays its female employees in California

June 14, 2024
elon

‘I love you guys!’: Elon Musk lands $44.9bn pay deal after Tesla vote

June 14, 2024
vodafone

Vodafone Idea Reduces Debt with ₹2,458 Crore Preferential Share Issuance to Nokia, Ericsson

June 14, 2024
ambuja

Ambuja Cements acquires Penna Cement Industries for Rs 10,422 Crore

June 14, 2024
stocks

Stocks to Watch out after negative opening of Sensex and Nifty

June 12, 2024
reliance

Check the reason behind Anil Ambani’s soaring Reliance Power Share

June 12, 2024
market

Rupee market Depreciates 10 Paise to 83.50 Against US Dollar Amid Rising Dollar Index and US Yields

June 11, 2024

About Us

Business Verce is your portal to the vital and ever-enthralling sphere of business. We presents an array of engaging articles, news, and features focusing on the latest business trends, financial developments, leadership insights, and career guidance.

Follow Us

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms & Conditions
  • Disclaimer

Copyright © 2023 Business Verce | All Rights Reserved | Made with 💙 by Unick Digitals

No Result
View All Result
  • Home
  • Featured
  • Business
  • Tech
  • Lifestyle
  • Sports
  • India
  • World
  • Advertise
  • Internship
  • Learn More
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

Copyright © 2023 Business Verce | All Rights Reserved | Made with 💙 by Unick Digitals