A bright blue roller coaster with red and dark blue cars is filled with thrill seekers in piece about Google Gemini's Image Recognition rollback.
News Analysis

Google Revises Image Recognition for Gemini as It Sets a Relaunch

4 minute read
Pierre DeBois avatar
SAVED
Google's AI roller coaster hits a snag as it pauses Gemini's image recognition, impacting its stock and race for AI dominance.

The Gist

  • AI pause. Google halts image recognition in Gemini, impacting its AI saga and raising questions about its future.
  • Stock impact. Stock dips as Google pauses Gemini's image generation feature, highlighting the tech giant's challenges.
  • Relaunch effort. Google races to relaunch Gemini amidst AI competition, aiming to regain its footing in the AI landscape.

The AI saga continues for Google. 

In early February, Google introduced image generation via its Gemini model. However, on Feb. 22, Google withdrew image recognition from Gemini's features after social media users pointed out inaccuracies in some historical depictions generated by the model.

According to Reuters, Google DeepMind CEO Demis Hassabis commented on the feature’s return. While on a panel during the Mobile World Congress in Barcelona, Hassabis said that Google expects to relaunch the feature in the next few weeks.

Image recognition in generative AI has been instrumental in AI’s influence in content creation. It is used for art creation, concept illustration and product image generation. The ability to generate human images has been less than stellar in last year’s attempts from generative AI solutions, but marketplace expectations were high for Google Gemini because of the promising improvements its architecture brought.

Related Article: Conversational AI Brings Google Gemini to Google Ads

How Does the Image Recognition Setback Affect Google's Image?

The feature setback is significant for a myriad of reasons.

Google's pause comes at a crucial time when AI model providers are rushing to incorporate multimodal features — the ability to handle and process information from multiple media types, including audio, video and images. This is highly desired as it can attract users to a single AI resource that serves their complex prompts well. Many AI researchers and users are finding that chain of thought prompt techniques improve with additional media to describe the desired response output, making multimodal approaches to prompts increasingly popular.

The pause is also pivotal as Google works to minimize the financial market blowback. On Monday, Feb. 26, Google’s stock declined 4.4% .

Ironically, Google experienced a similar stock market reaction almost exactly a year ago when it was launching Bard, the predecessor to Gemini. In a promotional video that touted Bard as experimental, Bard incorrectly identified the James Webb Space Telescope (JWST) as the telescope used to discover a planet outside the Earth's solar system, when it was actually the European Southern Observatory's Very Large Telescope (VLT) in 2004. News of this error caused Google's share price to drop by 9%.

Pausing image generation is also a move to protect the branding investment Google made in renaming Bard as Gemini, capitalizing on the budding accolades the company was receiving.

Learning Opportunities

Related Article: Google's Gemini Marketing Trick

AI Ain’t Perfect

Since their arrival in the marketplace, generative AI systems have occasionally displayed examples of their algorithmic risks for bias and error. Inadvertent data biases injected into the large language models hold potential for skewed results and mishandled content such as sexualized images or plagiarized text. 

The risk extends to corporations issuing generative AI algorithms as well. For example, Microsoft released a chatbot on Twitter back in 2016 that immediately began generating racist content. Microsoft quickly shut it down.

Google sign on building, logo decorated in the rainbow colors in honor of LGBT Pride Month on the office building facade. Google Gemini has seen more changes.
Ironically, Google experienced a similar stock market reaction almost exactly a year ago when it was launching Bard, the predecessor to Gemini.MichaelVi

Related Article: Midjourney vs. DALL-E 2 vs. Stable Diffusion. Which AI Image Generator Is Best for Marketers?

What’s Next for Google Gemini?

Google will likely recover from this misstep and the huge stock price hits. Many image generators are still niche sites that rely on prompts with photography terminology for the best responses. Although competitor ChatGPT made a significant impact with its incorporation of DALL-E, Google has the opportunity to catch up if it executes its corrections effectively.

Google's introduction of subscription services means that customers now have higher expectations for reliability, unlike with Bard, which was cautiously touted as experimental. This year, Gemini must position itself as a highly desirable product that comprehensively serves a vast number of customers, comparable to Google's own successful search engine.

Google faces intense competition for feature attention among AI users, as OpenAI touts Sora, its text-to-image creator. Even Elon Musk is promoting Grok to gain attention for X's efforts in the AI space.

Industry analysts consider Google to be lagging in transforming its innovations into workable solutions. Although Google's stock price is still higher than last year, its year-to-date performance lags behind the S&P Index.

For now, image recognition is paused, but Google believes its quest to conquer the multimodal realm of the AI kingdom has only just begun.

About the Author

Pierre DeBois

Pierre DeBois is the founder and CEO of Zimana, an analytics services firm that helps organizations achieve improvements in marketing, website development, and business operations. Zimana has provided analysis services using Google Analytics, R Programming, Python, JavaScript and other technologies where data and metrics abide. Connect with Pierre DeBois:

Main image: harlequin9 on Adobe Stock Photos