AI

Google’s Gemini updates: How Project Astra is powering some of I/O’s big reveals

Comment

Gemini
Image Credits: Google / Google

Google is improving its AI-powered chatbot Gemini so that it can better understand the world around it — and the people conversing with it.

At the Google I/O 2024 developer conference on Tuesday, the company previewed a new experience in Gemini called Gemini Live, which lets users have “in-depth” voice chats with Gemini on their smartphones. Users can interrupt Gemini while the chatbot’s speaking to ask clarifying questions, and it’ll adapt to their speech patterns in real time. And Gemini can see and respond to users’ surroundings, either via photos or video captured by their smartphones’ cameras.

“With Live, Gemini can better understand you,” Sissie Hsiao, GM for Gemini experiences at Google, said during a press briefing. “It’s custom-tuned to be intuitive and have a back-and-forth, actual conversation with [the underlying AI] model.”

Gemini Live is in some ways the evolution of Google Lens, Google’s long-standing computer vision platform to analyze images and videos, and Google Assistant, Google’s AI-powered, speech-generating and -recognizing virtual assistant across phones, smart speakers and TVs.

At first glance, Live doesn’t seem like a drastic upgrade over existing tech. But Google claims it taps newer techniques from the generative AI field to deliver superior, less error-prone image analysis — and combines these techniques with an enhanced speech engine for more consistent, emotionally expressive and realistic multi-turn dialogue.

“It’s a real-time voice interface and [has] extremely powerful multimodal capabilities combined with long context,” Oriol Vinyals, principal scientist at DeepMind, Google’s AI research division, told TechCrunch in an interview. “You could imagine how that combination will feel very powerful.”

The technical innovations driving Live stem in part from Project Astra, a new initiative within DeepMind to create AI-powered apps and “agents” for real-time, multimodal understanding.

“We’ve always wanted to build a universal agent that will be useful in everyday life,” Demis Hassabis, CEO of DeepMind, said during the briefing. “Imagine agents that can see and hear what we do, better understand the context we’re in and respond quickly in conversation, making the pace and quality of interactions feel much more natural.”

Gemini Live — which won’t launch until later this year — can answer questions about things within view (or recently within view) of a smartphone’s camera, like which neighborhood a user might be in or the name of a part on a broken bicycle. Pointed at a portion of computer code, Live can explain what that code does. Or, asked about where a pair of glasses might be, Live can say where it last “saw” the glasses.

Gemini
Image Credits: Google

Live is also designed to serve as a virtual coach of sorts, helping users rehearse for events, brainstorm ideas and so on. Live can suggest which skills to highlight in an upcoming job or internship interview, for instance, or give public speaking advice.

“Gemini Live can provide information more succinctly and answer more conversationally than, for example, if you’re interacting in just text,” Sissie said. “We think that an AI assistant should be able to solve complex problems … and also feel very natural and fluid when you engage with it.”

Gemini Live’s ability to “remember” is made possible by the architecture of the model underpinning it: Gemini 1.5 Pro (and to a lesser extent other “task-specific” generative models), which is the current flagship in Google’s Gemini family of generative AI models. It has a longer-than-average context window, meaning it can take in and reason over a lot of data — about an hour of video (RIP, smartphone batteries) — before crafting a response.

“That’s hours of video that you could have interacting with the model, and it would remember all that has happened before,” Vinyals said.

Live is reminiscent of the generative AI behind Meta’s Ray-Ban glasses, which similarly can look at images captured by a camera and interpret them in near-real time. Judging from the pre-recorded demo reels Google showed during the briefing, it’s also quite similar — conspicuously so — to OpenAI’s recently revamped ChatGPT.

One key difference between the new ChatGPT and Gemini Live is that Gemini Live won’t be free. Once it launches, Live will be exclusive to Gemini Advanced, a more sophisticated version of Gemini that’s gated behind the Google One AI Premium Plan, priced at $20 per month.

Perhaps in a jab at Meta, one of Google’s demos showed a person wearing AR glasses equipped with a Gemini Live-like app. Google — doubtless keen to avoid another dud in the eyewear department — declined to say whether those glasses or any glasses powered by its generative AI would come to market in the near future.

Vinyals didn’t completely shut down the idea, though. “We’re still prototyping and, of course, showcasing [Astra and Gemini Live] to the world,” he said. “We’re seeing the reaction from folks that can try it, and that will inform where we go.”

Other Gemini updates

Beyond Live, Gemini is getting a range of upgrades to make it more useful day-to-day.

Gemini Advanced users in more than 150 countries and over 35 languages can take advantage of Gemini 1.5 Pro’s larger context to have the chatbot analyze, summarize and answer questions about long (up to 1,500 pages) documents. (While Live is arriving later in the year, Gemini Advanced users can interact with Gemini 1.5 Pro starting today.) Documents can now be imported from Google Drive or uploaded directly from a mobile device.

Later this year for Gemini Advanced users, the context window will grow even larger — to 2 million tokens — and bring with it support for uploading videos (up to two hours in length) to Gemini and having Gemini analyze big codebases (more than 30,000 lines of code). 

Google claims that the large context window will improve Gemini’s image understanding. For example, given a photo of a fish dish, Gemini will be able to suggest a comparable recipe. Or, given a math problem, Gemini will provide step-by-step instructions on how to solve it. 

And it’ll help Gemini to trip plan. 

Gemini
Image Credits: Google

In the coming months, Gemini Advanced will gain a new “planning experience” that creates custom travel itineraries from prompts. Taking into account things like flight times (from emails in a user’s Gmail inbox), meal preferences and information about local attractions (from Google Search and Maps data), as well as the distances between those attractions, Gemini will generate an itinerary that updates automatically to reflect any changes. 

In the more immediate future, Gemini Advanced users will be able to create Gems, custom chatbots powered by Google’s Gemini models. Along the lines of OpenAI’s GPTs, Gems can be generated from natural language descriptions — for example, “You’re my running coach. Give me a daily running plan” — and shared with others or kept private. No word on whether Google plans to launch a storefront for Gems like OpenAI’s GPT Store; hopefully we’ll learn more as I/O goes on.

Soon, Gems and Gemini proper will be able to tap an expanded set of integrations with Google services, including Google Calendar, Tasks, Keep and YouTube Music, to complete various labor-saving tasks.

Gemini
Image Credits: Google

“Let’s say you have a flier from your kid’s school, and there’s all these events that you want to add to your personal calendar,” Hsiao said. “You’ll be able to take a picture of this flier and ask the Gemini app to create these calendar entries directly onto your calendar. This is going to be a great time saver.”

Given generative AI’s tendency to get summaries wrong and generally go off the rails (plus Gemini’s not-so-glowing early reviews), take Google’s claims with a grain of salt. But if the improved Gemini and Gemini Advanced actually perform as Hsiao describes — and that’s a big if — they could be great time savers indeed. 

We’re launching an AI newsletter! Sign up here to start receiving it in your inboxes on June 5.

Read more about Google I/O 2024 on TechCrunch

More TechCrunch

J2 Ventures, a firm led mostly by the U.S. military veterans, announced on Thursday that it has raised a $150 million second fund. The Boston-based firm invests in startups whose…

J2 Ventures, focused on military healthcare, grabs $150M for its second fund

HealthEquity said in an 8-K filing with the SEC that it detected “anomalous behavior by a personal use device belonging to a business partner.”

HealthEquity says data breach is an ‘isolated incident’

Roll20 said that on June 29 it had detected that a “bad actor” gained access to an account on the company’s administrative website for one hour.

Roll20, an online tabletop role-playing game platform, discloses data breach

Fisker has a willing buyer for its remaining inventory of all-electric Ocean SUVs, and has asked the Delaware Bankruptcy Court judge overseeing its Chapter 11 case to approve the sale.…

Fisker asks bankruptcy court to sell its EVs at average of $14,000 each

Teddy Solomon just moved to a new house in Palo Alto, so he turned to the Stanford community on Fizz to furnish his room. “Every time I show up to…

Fizz, the anonymous Gen Z social app, adds a marketplace for college students

With increasing competition for what is, essentially, still a small number of hard tech and deep tech deals, Sidney Scott realized it would be a challenge for smaller funds like…

Why deep tech VC Driving Forces is shutting down

A guide to turn off reactions on your iPhone and Mac so you don’t get surprised by effects during work video calls.

How to turn off those silly video call reactions on iPhone and Mac

Amazon has decided to discontinue its Astro for Business device, a security robot for small- and medium-sized businesses, just seven months after launch.  In an email sent to customers and…

Amazon retires its Astro for Business security robot after only 7 months

Hiya, folks, and welcome to TechCrunch’s regular AI newsletter. This week in AI, the U.S. Supreme Court struck down “Chevron deference,” a 40-year-old ruling on federal agencies’ power that required…

This Week in AI: With Chevron’s demise, AI regulation seems dead in the water

Noplace had already gone viral ahead of its public launch because of its feature that allows users to express themselves by customizing the colors of their profile.

noplace, a mashup of Twitter and Myspace for Gen Z, hits No. 1 on the App Store

Cloudflare analyzed AI bot and crawler traffic to fine-tune automatic bot detection models.

Cloudflare launches a tool to combat AI bots

Twilio says “threat actors were able to identify” phone numbers of people who use the two-factor app Authy.

Twilio says hackers identified cell phone numbers of two-factor app Authy users

The news brings closure to more than two years of volleying back and forth between some of the biggest names in additive manufacturing.

Nano Dimension is buying Desktop Metal

Planning to attend TechCrunch Disrupt 2024 with your team? Maximize your team-building time and your company’s impact across the entire conference when you bring your team. Groups of 4 to…

Groups save big at TechCrunch Disrupt 2024

As more music streaming apps and creation tools emerge to compete for users’ attention, social music-sharing app Popster is getting two new features to grow its user base: an AI…

Music video-sharing app Popster uses generative AI and lets artists remix videos

Meta’s Threads now has more than 175 million monthly active users, Mark Zuckerberg announced on Wednesday. The announcement comes two days away from Threads’ first anniversary. Zuckerberg revealed back in…

Threads nears its one-year anniversary with more than 175M monthly active users

Cartken and its diminutive sidewalk delivery robots first rolled into the world with a narrow charter: carrying everything from burritos and bento boxes to pizza and pad thai that last…

From burritos to biotech: How robotics startup Cartken found its AV niche

Ashwin Nandakumar and Ashwin Jainarayanan were working on their doctorates at adjacent departments in Oxford, but they didn’t know each other. Nandakumar, who was studying oncology, one day stumbled across…

Granza Bio grabs $7M seed from Felicis and YC to advance delivery of cancer treatments

LG has acquired an 80% stake in Athom, a Dutch smart home company and maker of the Homey smart home hub. According to LG’s announcement, it will purchase the remaining…

LG acquires smart home platform Athom to bring third-party connectivity to its ThinQ ecosytem

CoinDCX, India’s leading cryptocurrency exchange, is expanding internationally through the acquisition of BitOasis, a digital asset platform in the Middle East and North Africa, the companies said Wednesday. The Bengaluru-based…

CoinDCX acquires BitOasis in international expansion push

Collaborative document features are being made available inside Proton Drive, further extending the company’s trademark pitch of robust security.

In a major update, Proton adds privacy-safe document collaboration to Drive, its freemium E2EE cloud storage service

Telegram launched a digital currency called Stars for in-app use last month. Now, the company is expanding its use cases to paid content. The chat app is also allowing channels…

Telegram lets creators share paid content to channels

For the past couple of years, innovation has been accelerating in new materials development. And a new French startup called Altrove plans to play a role in this innovation cycle.…

Altrove uses AI models and lab automation to create new materials

The Indian social media platform Koo, which positioned itself as a competitor to Elon Musk’s X, is ceasing operations after its last-resort acquisition talks with Dailyhunt collapsed. Despite securing over…

Indian social network Koo is shutting down as buyout talks collapse

Apiday leverages AI to save time for its customers. But like legacy consultants, it also offers human expertise.

Europe is still serious about ESG, and Apiday is helping companies comply

Google totally dodges the question of how much energy is AI is using — perhaps because the answer is “way more than we’d care to say.”

Google’s environmental report pointedly avoids AI’s actual energy cost

SpaceX’s ambitious plans to launch its Starship mega-rocket up to 44 times per year from NASA’s Kennedy Space Center are causing a stir among some of its competitors. Late last…

SpaceX wants to launch up to 120 times a year from Florida — and competitors aren’t happy about it

The situation around a data breach that’s affected an ever-growing number of fintech companies has gotten even weirder. Evolve Bank & Trust announced last week that it was hacked and…

Newsletter writer covering Evolve Bank’s data breach says the bank sent him a cease and desist letter

The new bylines go beyond the typical @username references that often accompany link posts from news publications and those pointing to other written content, like a WordPress blog or Substack

Twitter/X alternative Mastodon appeals to journalists with new ‘byline’ feature

code references found in the X iOS app indicate that the company could be considering adding downvotes for replies only to improve how they’re ranked.

X weighs adding a downvote button to replies — but it doesn’t want to emulate Reddit