Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Shortly after news spread that Google was delaying the release of its long-awaited AI model called Gemini, Google announced its launch.
As part of the release, they’ve released a demo that showcases the impressive – downright amazing – capabilities of Gemini. Well, you know what they say about things being too good to be true.
Let’s explore what went wrong with the demo and how it compares to OpenAI.
Rivaling OpenAI’s GPT-4, Gemini is a multimodal AI model, meaning it can process text, image, audio, and code input.
(For a long time ChatGPT was unimodal, only processing text, until this year it switched to multimodality.)
Gemini comes in three versions:
Ultra isn’t yet available to consumers, with a rollout scheduled for early 2024 as Google conducts final tests to ensure it’s safe for commercial use. The Gemini Nano will power Google’s Pixel 8 Pro phone, which has built-in AI features.
On the other hand, Gemini Pro will run Google tools like Bard starting today and is available via API through Google AI Studio and Google Cloud Vertex AI.
Google released a six-minute YouTube demo of Gemini’s skills in language, game creation, logic and spatial thinking, cultural understanding and more.
If you watch the video, you will easily be amazed.
A Gemini is able to recognize a duck from a simple drawing, understand a clever trick, and complete visual puzzles—to name just a few tasks.
However, after earning more than 2 million views, a Bloomberg report revealed that the video was cut and spliced which inflated Gemini’s performance.
Google shared a disclaimer at the start of the video: “For the purposes of this demonstration, latency has been reduced and Gemini outputs have been truncated for brevity.”
However, Bloomberg points out that they left out several important details:
The way Gemini actually handled inputs in the demo was through photos and written instructions.
It’s like showing everyone your dog’s best trick.
Share the video via text and everyone is impressed. But when they’re all done, they see that it actually takes a whole bunch of treats and petting and patience and repetition 100 times to see this trick in action.
Let’s make a side-by-side comparison.
In this 8 second clip, we see a person gesturing with his hand as if playing a game used to settle all friendly disputes. The twins reply, “I know what you’re doing. You are playing rock-paper-scissors.“
But what actually happened behind the scenes involved a lot more spoon-feeding.
In an actual demo, a user submitted each hand gesture individually and asked Gemini to describe what they saw.
From there, the user combined all three images, asked Gemini again, and included a big tip.
While it’s still impressive how well Gemini can process images and understand context, the video downplays how much control is required for Gemini to generate the right response.
While this has brought Google plenty of criticism, some point out that it’s not uncommon for companies to use editing to create more seamless, idealistic use cases in their demos.
Until now, GPT-4, created by OpenAI, was the most powerful AI model on the market. Since then, Google and other AI players have been working hard to find a model that can beat it.
Google first teased Gemini in September, suggesting it would beat GPT-4 and technically, it did.
Gemini outperforms GPT-4 in a number of benchmarks set by artificial intelligence researchers.
However, the Bloomberg article points out something important.
For a model that took so long to release, the fact that it’s only marginally better than GPT-4 isn’t the big win Google was aiming for.
OpenAI released GPT-4 in March. Google now releases Gemini, which is better, but only by a few percentage points.
So how long will it take for OpenAI to release an even bigger and better version? Judging by last year, it probably won’t be long.
For now, Gemini seems to be the better option, but that won’t be clear until early 2024 when the Ultra comes out.