Adobe Answers the Sora Question & New SD3 Features!

Theoretically Media
18 Apr 202415:52

TLDRIn this video, the host discusses several advancements in AI technology. They start with the release of Stable Diffusion 3, which excels in prompt understanding and text generation within images. The host highlights new features like a creative upscaler, search and replace, and background removal. They also compare Stable Diffusion 3 with Stable Diffusion XL, noting the improved clarity and detail in the former. Next, they discuss Adobe's integration of generative AI video features into Adobe Premiere, including object removal and addition, and generative extend. Adobe's commitment to an open ecosystem and partnerships with AI model providers is emphasized. The host then explores Microsoft's entry into the AI avatar space with Vasa 1, which uses audio to drive realistic facial expressions and lip sync. Lastly, they mention their experience as a judge in the first AI art esports competition, which utilized Leonardo's real-time drawing tool for a unique and exciting event.

Takeaways

  • ๐Ÿ“ˆ Stable Diffusion 3 has been released with improved prompt understanding and text-to-image generation capabilities.
  • ๐Ÿ” The new version includes features like a creative upscaler up to 4K, in-painting, out-painting, search and replace, and background removal.
  • ๐ŸŽจ There's an image-to-video feature that connects with Stable Diffusion Video, and an image-to-image editing option with a mask for detailed modifications.
  • ๐Ÿ“š Adobe discussed their upcoming generative AI video features for Adobe Premiere, focusing on object removal, addition, and generative extend functionalities.
  • ๐Ÿค– Microsoft entered the AI avatar market with VASA 1, which generates lifelike, audio-driven talking faces in real time with impressive lip sync and facial nuances.
  • ๐Ÿ”— Adobe is working on integrating third-party AI models, like OpenAI's Sora, into Premiere Pro for more specialized use cases.
  • ๐Ÿ“น Adobe is also exploring generative audio to accompany video clip extensions, offering more control and granularity over the output.
  • ๐Ÿ“ธ Adobe's Firefly video model will continue to be commercially safe, and any third-party models used will receive content credentials.
  • ๐Ÿ“Š Adobe is aiming to improve smart and manual masking tools within Premiere, enhancing the user experience for video editing.
  • ๐ŸŽฎ The first AI art esports competition was held, showcasing the potential for community events centered around AI technology.
  • ๐ŸŒŸ The event featured real-time drawing using Leonardo's tool, where contestants had to generate images based on prompts in just one minute.

Q & A

  • What was the main focus of the NAB convention in Las Vegas that the speaker attended?

    -The main focus of the NAB convention in Las Vegas, as mentioned by the speaker, was on generative AI and collaboration, with a particular emphasis on Adobe's new AI video generation features and the integration of these technologies into Adobe Premiere.

  • What are the two versions of Stable Diffusion 3 mentioned in the transcript?

    -The two versions of Stable Diffusion 3 mentioned are Stable Diffusion 3 and Stable Diffusion 3 Turbo.

  • What new feature does Stable Diffusion 3 offer that is considered a time-saver for image editing?

    -Stable Diffusion 3 offers a 'remove background' feature built right in, which is considered a time-saver as it eliminates the need for manual masking tools.

  • How does Stable Diffusion 3 improve upon its predecessor, Stable Diffusion XL, in terms of image generation?

    -Stable Diffusion 3 improves upon Stable Diffusion XL by offering better prompt understanding, sharper and crisper images, more detailed patterns in the generated images, and a more accurate representation of objects as described in the prompt.

  • What is the name of the new platform announced by Stability AI?

    -The new platform announced by Stability AI is called Stable Assistant Beta.

  • What kind of access does the Stable Assistant Beta offer to its paying subscribers?

    -Stable Assistant Beta offers paying subscribers access to the latest models, the ability to generate images, write content, and match photos to text through conversation.

  • What is the significance of the Firefly image model mentioned in the context of Adobe?

    -The Firefly image model is a collection of AI models within Adobe that the company is using to develop its video model. It signifies Adobe's commitment to integrating advanced AI capabilities into their creative software suite.

  • What are some of the AI video generation features that Adobe is planning to integrate into Adobe Premiere?

    -Adobe is planning to integrate features such as object removal, object addition, generative extend (to match the beat of music), and clip extensions that allow for more control over video editing through AI.

  • How does Microsoft's Vasa AI differ from other AI avatar models?

    -Microsoft's Vasa AI differs from other models as it is audio-driven rather than text-based, which allows for more natural and realistic lip sync and a wider range of facial nuances and head motions.

  • What was the role of the speaker in the first AI art esports competition?

    -The speaker served as a judge in the first AI art esports competition, which was organized by Creative Refuge.

  • What is the potential impact of community events based around AI technology like the AI art esports competition?

    -Community events based around AI technology, like the AI art esports competition, illustrate the potential for AI to bring people together in creative and competitive ways, fostering a sense of community and showcasing the capabilities of AI in a fun and engaging format.

Outlines

00:00

๐Ÿš€ Stable Diffusion 3 Release and Features

The video begins with the host returning from the NAB convention in Las Vegas and immediately diving into the latest developments in AI. The main highlight is the release of Stable Diffusion 3, which is praised for its prompt understanding and text-to-image generation capabilities. The host discusses the new features of Stable Diffusion 3, including a creative upscaler that can reach up to 4K resolution, in-painting and out-painting tools, search and replace functionality, background removal, and an image-to-video feature. A comparison is made between Stable Diffusion 3 and Stable Diffusion XL, showcasing the improved clarity and detail in the examples provided. The host also mentions Stable Assistant Beta, a new platform offering a chatbot service for generating images, writing content, and matching photos to text through conversation, with pricing plans available for subscribers.

05:00

๐Ÿ–ฅ๏ธ Adobe Premiere's AI Video Generation Features

The host follows up on a previous video by discussing Adobe's announcement of generative AI video features in Adobe Premiere. An interview with Kyle from Adobe provides insights into the upcoming video model, which will allow for object removal, object addition, generative extend, and other AI-powered editing capabilities. The conversation touches on Adobe's open ecosystem and partnerships with other AI model providers, such as Open AI, Sora, Model Runway ML, and Pika. The host inquires about the level of control users will have over the output and the potential for a standalone platform for video, to which Kyle responds that the features will be integrated within the Adobe ecosystem. They also discuss the commercial safety of third-party models and the importance of content credentials for transparency in media generation.

10:00

๐Ÿค– Microsoft's Vasa 1: AI Avatars with Real-Time Audio-Driven Faces

The host introduces Microsoft's entry into the AI avatar space with Vasa 1, which generates lifelike, audio-driven talking faces in real time. Unlike text-based models, Vasa 1 is noted for its impressive lip-sync and the natural range of facial expressions and head movements it can produce. The host shares examples of Vasa 1 in action, highlighting the realistic feel when the avatars are driven by actual recorded audio. The host also notes some minor issues with the technology, such as occasional jerking head movements and hair that doesn't track perfectly with head movements. The segment ends with a mention of camera controls within Vasa 1 and the potential use of the technology in applications like Microsoft Teams.

15:04

๐ŸŽฎ First AI Art Esports Competition

The host shares his experience as a judge in the first AI art esports competition organized by Creative Refuge. The event involved contestants using Leonardo's real-time drawing tool to create images based on prompts given by the audience and selected by the judges. The host describes the competition as more exciting and nail-biting than expected, emphasizing the potential for community events centered around AI technology. He encourages viewers to check out Creative Refuge's channel for a full rundown of the event and concludes the video by expressing his exhaustion from the NAB convention in Las Vegas.

Mindmap

Keywords

๐Ÿ’กNAB convention

The NAB convention, or National Association of Broadcasters convention, is a major trade show and professional development conference for the media and entertainment industry. In the script, the speaker has just returned from this event in Las Vegas, indicating that the news and updates discussed are likely to be significant and relevant to the industry.

๐Ÿ’กAI video generation features

AI video generation features refer to the use of artificial intelligence to create or enhance video content. In the context of the video, Adobe is integrating these features into Premiere, which is a professional video editing software. This suggests that users will be able to generate video content using AI, making the editing process more efficient and creative.

๐Ÿ’กMicrosoft AI avatar

An AI avatar, in this case developed by Microsoft and referred to as Vasa, is a digital representation of a human that can interact in a virtual environment. The script mentions that Microsoft has entered the AI avatar game with impressive results, indicating that these avatars can generate lifelike, audio-driven talking faces in real time, which is a significant advancement in the field of virtual communication and representation.

๐Ÿ’กStable Diffusion 3

Stable Diffusion 3 is an AI model for generating images from text prompts. It is mentioned in the script as having improved capabilities over its predecessor, including better prompt understanding and text-to-image generation. The model is significant as it represents the latest advancements in AI image generation technology.

๐Ÿ’กAdobe Premiere

Adobe Premiere is a widely used video editing software. In the script, it is highlighted that Adobe is answering questions about new AI video generation features coming to Premiere. This indicates that AI technology is becoming more integrated into professional video editing tools, which could greatly impact the capabilities and workflow of video editors.

๐Ÿ’กStable Assistant Beta

Stable Assistant Beta is a platform mentioned in the script that allows paying subscribers to access the latest AI models for generating images and writing content. It represents the commercialization and accessibility of AI technology for creative tasks, suggesting a future where AI assistance is commonplace in content creation.

๐Ÿ’กAI art esport competition

An AI art esport competition, as described in the script, is an event where contestants use AI tools to create art in real-time, often based on prompts. The speaker was a judge in the first such competition, which used Leonardo's real-time drawing tool. This concept showcases the potential for AI not only in professional settings but also in community and competitive events.

๐Ÿ’กFirefly image model

The Firefly image model is part of Adobe's collection of AI models. It is mentioned in the context of Adobe's efforts to integrate AI into their video editing capabilities. The Firefly model is significant as it represents Adobe's commitment to leveraging AI technology to enhance their creative software offerings.

๐Ÿ’กContent credentials

Content credentials, as discussed in the script, are like a 'nutrition label' for AI-generated media. They provide information about whether the media is entirely AI-generated or just modified, and which model was used in its creation. This is important for transparency and understanding the origins and nature of the content.

๐Ÿ’กSmart masking

Smart masking is a feature that allows for more efficient and accurate selection of objects within video editing software. In the context of the video, Adobe is working on integrating smart masking into Premiere, which will improve the object removal process. This highlights the ongoing development of AI-assisted tools to streamline and enhance video editing.

๐Ÿ’กAI-generated lip sync

AI-generated lip sync is a technology that synchronizes the movements of a character's lips with the audio in a video. The script mentions Vasa's impressive lip sync capabilities, which are driven by actual recorded audio to create a more natural and realistic appearance. This technology is significant for the creation of realistic AI avatars and virtual characters.

Highlights

Stable Diffusion 3 has been released with improved prompt understanding and text within images generation.

Stable Diffusion 3 offers two versions: standard and 'turbo', with the latter being optimized for speed.

The new version includes a creative upscaler that can upscale images up to 4K resolution.

In-painting and out-painting features have been enhanced with a search and replace function that doesn't require a mask.

A built-in background removal feature is included for convenience.

Image-to-video functionality connects directly to Stable Diffusion Video.

Image-to-image editing with a mask allows for text or image prompts to modify specific parts of an image.

Comparisons between Stable Diffusion 3 and XL show significant improvements in image quality and detail.

Adobe answers questions about new AI video generation features coming to Premiere Pro.

Adobe's generative AI video features will allow object removal, addition, and generative extend functionalities.

Adobe is working with big video AI model providers like Open AI, Sora, Model Runway ML, and Pika.

Stable Assistant Beta is a new platform offering a friendly chatbot for subscribers to generate images and content.

Microsoft has entered the AI avatar game with VASA, an audio-driven, lifelike talking face generation system.

VASA's lip sync and facial nuances contribute to a high level of authenticity in the generated avatars.

The first AI art esports competition was held, showcasing the potential for community events based on AI technology.

The competition used Leonardo's real-time drawing tool, emphasizing the potential for creative applications of AI.

Adobe is focusing on bringing smart masking and improved manual masking tools to Premiere Pro.

Generative audio will be included alongside clip extensions in Adobe's upcoming features.