GPT-4o: The Next Step Towards a Seamless Human-Computer Interaction

Simon Orgulan
5 min
June 5, 2024

Everyone still remembers the end of 2022 when the first public release of ChatGPT took the world by storm. It was head and heels above what other AI tools could do at the time, and since that breakthrough day, many other AI tools have surfaced that have tried to either imitate it or come up with a whole new concept of using the power of AI for problem solving or as a creative powerhouse.

Fast forward to today, and GPT-4o is here, the anticipated upgrade that keeps blurring the boundaries between human and computer interaction. According to OpenAI, the milestone they were aiming for during development is to string together the dimensions of audio, video and text, in real-time, no less. But have they succeeded and what does this mean for the end-user going forward? Let’s find out!

It’s ChatGPT on steroids!

If you thought what ChatGPT could do previously was impressive, this will take your breath away. Even GPT-4, the previous benchmark milestone, had to rely on other third-party models to process sound, for example. In contrast, GPT-4o can do this natively, which means it can respond virtually in a fraction of a second. It supports 50+ languages in its current iteration, so if you were looking for accurate real-time translation (or even someone to talk to in a foreign language), this is as good as it gets.

GPT-4o is free of charge for all ChatGPT users

Despite being such a notable upgrade in terms of speed and accuracy, GPT-4o remains a free upgrade to ChatGPT. Now, if you’re a paid user of ChatGPT, you might be wondering what this means. The answer is that it still makes sense to retain your paid subscription, as paid users receive the perk of having up to five times the capacity limit. In other words, both subscription tiers still make a lot of sense, but now you might have a great reason to pull the trigger on this if you haven’t already.

Multimodal commands galore

Inputting text commands is the way we’re used to interacting with ChatGPT. But now, we have the option of accomplishing the same with either audio of image commands, both of which are a game-changer! Imagine having your Alexa-like companion at your fingertips, but with all the added generative AI arsenal. In fact, we want you to start thinking how you’re going to integrate this into your workflow and level up your creativity, all while boosting your productivity on the side.

A diagnostician’s best friend

It doesn’t take a lot to conclude that, eventually, AI will increasingly start to find its way into modern healthcare, as it also happens to be good at diagnosing illnesses based on symptoms described. This is not just from it being a massive time-saver; you have to keep in mind that diagnosticians are people. Thus, they don’t always get it right and are prone to inaccuracies. In addition to being a push-button diagnostic tool, it’s also incredibly useful for the purpose of analyzing samples and monitoring patients.

Your dream consultant/coworker/assistant is here

Although a considerable amount of the global population has heard about ChatGPT by now, apparently, less than 25% of Americans have incorporated it into their daily or professional lives. OpenAI is well aware of it, and that was the philosophy behind GPT-4o – to help them make the transition by sweetening the deal. By the looks of it, their mission was a success, as people are starting to realize the potential of this update and what it could do for their lives.

A knowledgeable friend that you can interact with in real-time

Having a conversation with GPT-4o feels virtually indistinguishable from chatting with a friend or a coworker. It can understand subtle linguistic nuances and even strike up a realistic-sounding conversation with you. You might be thinking it’s full of awkward moments and just quirks in general, but in reality, it’s even less awkward than certain people you’ll meet in real-life. So if you ever wanted a believable pretend-friend, there you go.

Forget the old robotic-sounding audio responses that we had to endure in the past decade or so. Not only does it instantly come up with a fitting response to your query or simply by listening and observing what’s going on, it also makes it sound very human-like. In other words, its audio capabilities are much more than just a robo-voice module slapped on top of its textual output. It can adapt to the pace of the voice conversation and even take subtle social cues into account. It truly makes one wonder whether businesses like Rentafriend will become a thing of the past because of this.

It’s absolutely unbelievable if you just think about the scope of data that it needs to process in the background to deliver such a high-level performance, but that’s the state of technology. Something like this would have appeared to have come from a sci-fi movie as little as a decade ago. But now it’s here in our hands and quite literally so.

What still stands in the way of mass-adoption?

Did you know that, when the original ChatGPT first came out, almost the entire internet knew about it in less than a week? News travels fast these days it seems, and even faster if it’s about a massive, game-changing technological breakthrough. One would think that mass adoption would follow as a result, but it didn’t quite go that way. And the reason is that ChatGPT, despite its potency, is far from perfect.

One of its major flaws is its tendency to make things up on the fly and try to pass it off as fact. For obvious reasons, this is a massive problem, particularly for academics and journalism where staying factually correct is of high importance. But again, OpenAI is one step ahead of us; even when we saw the update from 3.5 to 4.0 come live, these kinds of hallucinations were reported to decrease by 82%. With each notable milestone, OpenAI is committed to reduce it even more.

The average response time of GPT-4o is 320 miliseconds

Imagine what you could do with that! What projects you could bring to life! A real-time AI-powered instructor? If the educational material doesn’t require it to move around or use its hands, why not? You could even have it be your personal tutor that never gets tired, never loses patience with you and never asks for a penny. The key to remember is that GPT-4o can interpret audio, video, and textual input at the same time since it’s all part of the same model that it uses. That’s the key to its amazing response time.

Some ideas on what you can accomplish with GPT-4o

Now that you have a rough idea how it works, it’s time to get your creative juices flowing and put your thinking hat on. To give you some inspiration, we’ve prepared some concrete uses cases to do with as you see fit:

- Have it answer your questions in real-time and be your Q&A assistant. Use it for therapy, training, tutoring, counseling… you name it.

- Have it be your coffee buddy. This one is kind of embarrassing perhaps, but then again, science has proven time and time again that people are social creatures and that socializing is a fundamental component of our health and wellbeing. Perhaps, at some point in the future, using GPT-4o for this kind of purpose will become more mainstream and consequently less awkward.

- Utilize it as your book summary assistant. Did you know that book summaries are a niche on Youtube? However, the problem with these videos is that you can’t instruct them to tell you more about this or that chapter as they are pre-rendered. GPT-4o, on the other hand, can give you a summary and then be more than willing to laser focus on a particular part of the book, for example, if you want to learn about it more in-depth.

- Let it analyze your reports for you. Do you need to extract something specific for your boss or reach a conclusion? Upload the file, let it analyze it, and then have a conversation with it to interpret the findings.

- Place it in the role of a real-time translator. Technically, Google Translate can do this as well, but it’s not anywhere near as handy or convenient, so GPT-4o might be the better alternative.


So yes, GPT-4o is here with its wonderful new input and output capabilities. The key thing to note is that it has the capacity to surpass the gap between various file formats, types of input, and even forms of expression, as it’s now a giant brain that seamlessly connects all the dots like a gigantic central metro station. Now it’s on you to figure out what your desired output should be and work it out from there.

Join 100,000+ businesses using Ocoya