At the Google I/O developer conference in May 2023, CEO Sundar Pichai announced the company’s upcoming artificial intelligence (AI) system, Gemini.
The large language model (LLM) is being developed by the Google DeepMind division (Brain Team + DeepMind). It could compete with AI systems like ChatGPT from OpenAI and possibly outperform them.
While details remain scarce, here is what we can piece together from the latest interviews and reports about Google Gemini.
Google Gemini Will Be Multimodal
Pichai stated that Gemini combines the strengths of DeepMind’s AlphaGo system, known for mastering the complex game Go, with extensive language modeling capabilities.
He said it is designed from the ground up to be multimodal, integrating text, images, and other data types. This could allow for more natural conversational abilities.
Pichai also hinted at future capabilities like memory and planning that could enable tasks requiring reasoning.
Gemini Can Use Tools And APIs
In an update to his professional bio over the summer, Google Chief Scientist Jeffrey Dean said Gemini is one of the “next-generation multimodal models” he is co-leading.
He stated it will utilize Pathways, Google’s new AI infrastructure, to enable scaling up training on diverse datasets.
This hints at Gemini potentially being the largest language model created to date, likely exceeding the size of GPT-3 with over 175 billion parameters.
It Will Come With Various Sizes And Capabilities
Additional details came from Demis Hassabis, CEO of DeepMind.
In June, he told Wired that techniques from AlphaGo, like reinforcement learning and tree search, may give Gemini new abilities like reasoning and problem-solving.
Hassabis stated Gemini is a “series of models” that will be made available in different sizes and capabilities.
He also mentioned Gemini may utilize memory, fact-checking against sources like Google Search, and improved reinforcement learning to enhance accuracy and reduce hazardous hallucinated content.
Early Gemini Results Are Promising
In a September Time interview, Hassabis reiterated that Gemini aims to combine scale and innovation.
He said incorporating planning and memory is in the early exploratory stages.
Hassabis also stated Gemini may employ retrieval methods to output entire blocks of information, rather than word-by-word generation, to improve factual consistency.
He revealed that Gemini builds on DeepMind’s multimodal work like the image captioning system Flamingo.
Overall, Hassabis said Gemini is showing “very promising early results.”
Advanced Chatbots As Universal Personal Assistants
In an interview with Wired, published a few days later, Pichai provided the most unambiguous indication of how Gemini fits into Google’s product roadmap.
He stated conversational AI systems like Bard are “not the end state” but waypoints leading towards more advanced chatbots.
Pichai said Gemini and future iterations will ultimately become “incredible universal personal assistants” integrated throughout people’s daily lives in areas like travel, work, and entertainment.
He reiterated that Gemini will combine strengths of text and images, stating that today’s chatbots will “look trivial” in comparison within a few years.
Competitors Are Interested In Gemini’s Performance
OpenAI CEO tweeted what appeared to be a response to a paywalled-article reporting that Google Gemini could outperform GPT-4.
Are the numbers wrong?
— Elon Musk (@elonmusk) August 30, 2023
There was no official response to the follow-up question by Elon Musk on whether the numbers provided by SemiAnalysis are correct.
Select Companies Have Early Access To Gemini
This suggests Gemini may soon be ready for a beta release and integration into services like Google Cloud Vertex AI.
Meta Working On LLM To Compete With OpenAI
While the news about Gemini is promising thus far, Google isn’t the only company reportedly ready to launch a new LLM to compete with OpenAI.
According to the Wall Street Journal, Meta is also working on an AI model that would compete with the GPT model that powers ChatGPT.
Meta most recently announced the release of Llama 2, an open-source AI model, in partnership with Microsoft. The company appears dedicated to responsibly creating AI that is more accessible.
The Countdown To Google Gemini
What we know so far indicates Gemini could represent a significant advancement in natural language processing.
The fusion of DeepMind’s latest AI research with Google’s vast computational resources makes the potential impact challenging to overstate.
If Gemini lives up to expectations, it could drive a change in interactive AI, aligning with Google’s ambitions to “bring AI in responsible ways to billions of people.”
The latest news from Meta and Google comes a few days after the first AI Insight Forum, where tech CEOs privately met with a portion of the United States Senate to discuss the future of AI.
Featured image: VDB Photos/Shutterstock