At its annual "Google I/O" keynote, the Mountain View firm shared the latest updates from DeepMind, its AI division. Video professionals paid particular attention to VEO, presented as Google's most advanced video generation model.
Images, music and video, Google injects AI everywhere. Objective: "Make AI useful for everyone." A clear message sent to its competitors, most notably Open AI, which has just unveiled GPT-40, its latest generative AI model.
With Veo, Google aims to rival Open AI's Sora
Veo's algorithm is capable of generating, from text, images and video prompts, high-quality films at 1080p resolution that can exceed one minute, in a wide variety of cinematic and visual styles. "Our video generation model will help create tools that make video production accessible to everyone. Whether you're a seasoned filmmaker, an aspiring creator or an educator looking to share your knowledge, Veo opens up new possibilities in storytelling, education and more. "says Google DeepMind.
For more precise rendering, Google's video creation tool has a deep understanding of cinematic technical vocabulary. "Thanks to an advanced understanding of natural language and visual semantics, it can generate videos that faithfully represent the user's creative vision, accurately capturing the tone of a prompt and rendering details in longer prompts. The model also includes cinematic terms such as "timelapse" or "aerial shots of a landscape", offering an unprecedented level of creative control. And it creates coherent, consistent sequences, so that people, animals and objects move realistically throughout the shots," reads the Google blog.
On its blog, Google said it was already planning to port some of Veo's features "to YouTube Shorts and other products".
How does Veo work?
Veo builds on Google's early work in video generation, such as Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumière, as well as Google's proprietary Transformer and Gemini architecture.
This technology enables it to generate sequences that are "coherent and homogeneous, so that people, animals and objects move realistically throughout the shots", features that make the tool particularly competitive with today's leading video generation models - not only Sora, but also models from startups such as Pika, Runway or Irreverent. Labs.
To facilitate identification and limit the risk of misappropriation, videos generated by Veo will be traceable thanks to SynthID watermarking technology. A digital watermark developed by Deepmind.
The video generation model will be made available via VideoFX before the end of 2024. Some filmmakers and creators can already preview the tool. To test Veo's capabilities, Google collaborated with filmmaker Donald Glover and his creative studio, Gilga. They used Veo to explore various creative techniques, including dynamic tracking shots, which require precise movement and consistent framing. The result is a promising video that reveals the power of the tool.
Where can I access Veo?
In the coming weeks, Google is set to offer some of Veo's features to selected creators via VideoFX, a new tool available on labs.google. This initiative provides early access to Veo's advanced video generation capabilities, giving creators the chance to experiment with its innovative features. The waiting list for Veo is currently open. The technology is not yet available in France.
"Google I/O 2024: artificial intelligence is everywhere
In addition to Veo, DeepMind presented several updates in generative AI. Douglas Eck, Director of Research at Google, presented Imagen 3, the American firm's most advanced text-image model to date. According to Douglas Eck, "Imagen 3 excels at creating photorealistic and realistic images. Thanks to its deep understanding of natural language prompts, the tool is able to capture complex details while minimizing visual artifacts." Visuals generated by imagen3 will also be identifiable thanks to SynthID integration. It is already possible to submit a few prompts to imagen3 on Google Labs.
On the music front, DeepMind, in partnership with YouTube, has unveiled Music AI Sandbox, a suite of music AI tools that can create music tracks from a text description, or modify the style of a melody in a matter of seconds.