New Delhi: Google recently introduced its newest artificial intelligence model, Lumiere, which specializes in multimodal video generation, capable of producing 5-second-long videos.
Lumiere employs a Space-Time U-Net (STUNet) architecture, which enhances the realism of motion in generated videos. Unlike traditional methods that assemble still frames, Lumiere creates videos in a single process, simultaneously addressing spatial and temporal elements to mimic natural motion.
The model generates 80 frames, a significant increase compared to Stable Diffusion’s 25 frames, and utilizes spatial and temporal down- and up-sampling along with a pre-trained text-to-image diffusion model for video generation.
While not available for public testing yet, the Lumiere website showcases videos created using the AI, along with text prompts and input images. It offers various video styles, cinemagraphs, and inpainting functionalities.
Lumiere competes with existing models like Runway Gen-2 and Pika 1.0, both accessible to the public, offering features such as video editing and shorter video generation capabilities.